Friday, November 14, 2014

HP Analytics blazes new trails in examining business trends from myriad data

The next BriefingsDirect deep-dive big data thought leadership interview examines how HP analyzes its own vast data warehouses to derive new insights for its global operations, extensive supply chain, sales organization, global marketing groups, and customers.

We'll explore how the Analytics Group at HP, based in India, sifts through myriad internal data sources, as well as joins with other public data sets, to deliver entirely new intelligence value that helps make business more responsive and efficient.

Listen to the podcast. Find it on iTunes. Read a full transcript or download a copy.

To learn how, BriefingsDirect sat down with Pramod Singh, Director of Digital and Big Data Analytics at HP Analytics in Bangalore, India, at the recent HP Big Data 2014 Conference in Boston. The discussion is moderated by me, Dana Gardner, Principal Analyst at Interarbor Solutions.

Here are some excerpts:

Gardner: Tell us a little bit about the Analytics Group at HP, what you do, and what’s the charter of your organization.

Singh: We have a big analytics organization in HP, it’s called Global Analytics and serves the analytics for most of HP. About 80 to 90 percent of the analytics happening inside of HP comes out of this eco-system. We do analytics across the entire food chain at HP, which includes the supply chains, marketing, and sales.

What I personally lead is an organization called Digital Analytics, and we are responsible for doing analytics across all digital properties for HP. That includes the eCommerce, social media, search, and campaign analytics. Additionally, we also have a Center of Excellence for Big Data Analytics, where we're using HP’s big-data technologies, which is that framework called HAVEn, to help develop big-data solutions for HP customers, as well as internal HP.
Fully experience the HP Vertica analytics platform...
Become a member of myVertica
Gardner: Obviously, HP is a very large global company. What sort of datasets are we talking about here? What’s the volume that you're working with?

Data explosion

Singh: As you know, a data explosion is happening. On one end, HP has done a very good job over the last six to seven years of getting most of their enterprise data into something called an enterprise data warehouse. We're talking about close to two petabytes of data, which is structured data.

Singh
The great part of this journey is that we have taken data from 700-800 different data marts into one enterprise data warehouse over the last three to four years. A lot of data that is not part of the enterprise is also becoming an important part of making the business decisions.

A lot of that data I personally deal with in the digital space, is what we call the human-generated data, the social media data, which no enterprise owns. It’s open for anybody to go use that. What I've started to see is that, on one hand, we've done a really good job of getting data in the enterprise and getting value out of it.

We've also started to analyze and harvest the data that is out in the open space. It could be blogs, Twitter feeds, or Facebook data. Combining that is what’s bringing real business value.

The Global Analytics organization is more than 1,000 people spread through different parts of the world. A big chunk of that is in Bangalore, India, but we have folks in the US and the UK. We have a center in Guadalajara, Mexico and couple of other locations in India. My particular organization is close to 100 people.
We've also started to analyze and harvest the data that is out in the open space. It could be blogs, Twitter feeds, or Facebook data. Combining that is what’s bringing real business value.

I have a PhD in pure mathematics, and before that I had an MBA in marketing. It's a little bit of an awkward mix there, and got in into analytics space in mid '90s working for Walmart.
I built out Walmart’s Assortment Planning System in late '90s and then came to HP in 2000 leading an advance data-mining center in Austin, Texas. From there I evolved into doing e-business analytics for few years and then moved to customer knowledge management. I spent five years in IT developing analytics platform.

About year-and-a-half ago, I got an opportunity to lead the big-data practice for this organization called Global Analytics. In five years, they had gone from five people to more than 1,000 people, and that intrigued me a lot. I was able to take the opportunity and move to India to go lead that team.

More insights

Gardner: Pramod, when we look back into this data, do you gain more insights knowing what you're looking for, or not knowing what you're looking for? What kind of insights were the unexpected consequences of your putting together this type of data infrastructure and then applying big-data analytics to it?

Singh: We deal with that day-in and day-out. I’ll give you a couple of examples there. This is something that happened about three or four years ago with HP. We were looking at a problem that was a classic problem in marketing to the US small and medium-sized business organizations (SMBs). We had a fixed budget for marketing, and across the US, there are more than 20 million SMBs. The classic definition of an SMBs is any business with 100-500 employees.

HP had an install base of a small part of that. We realized that particular segment of a SMBs is squeezed between a classic consumer, where you can do mass marketing, such as TV advertising, and an enterprise, where you can actually put bodies, your people who have relationship. SMBs are squeezed in between those two extremes.
The question then became what do we do with that? Again, when you do data mining and analytics, you may not know where this will lead you.

On one hand, you can't reach out to every single one of them. It’s just way too expensive to do that. On the other hand, if you try to go do the marketing, you don’t get the best out of it.

We were starting to work on something like that. I was approached by a vice president in marketing who said revenues are declining and they had a limited marketing budget. They didn’t know what to do.

This is where one of those unexpected things came in. I said, "Let’s see in that install base whether there are different segments of customers that are behaving differently." That led us on kind of a journey where we said, well, "How do we start to do that right? Let’s figure out what are the different attributes of data that I can capture."

On one hand, if you look at SMBs, you can capture who they are, what industry segment they're in, how many employees they have, where are they based, who the CEO is. It's what we call firmographics.

On the other hand, you have classes of data involving their interaction with HP. It could be things like how many PCs or servers they bought, how long ago did they buy it, how much money they spent, the whole transactional aspect of it.

Then, there are some things that are derived attributes. You may be able to derive that in the last one year they came to us four times. What interaction did we have on the website,? For example, did they come to us through a web channel? If they did, how many email offers were sent to them? How many of those were clicked? How many of those converted? Those are the classes of data that we could capture.

The question then became what do we do with that? Again, when you do data mining and analytics, you may not know where this will lead you.

Mathematical modeling

We thought that maybe there are different classes of customers. We pulled our data together and started to do mathematical modeling. There are techniques called clustering, analytical techniques called K-Means, and things like that. We started to get some results and to analyze them. In this type of situation, we have to be careful, because there are some things that may look mathematically correct, but may not have a real business value behind it.

Once we started to look at those things, we went through multiple iterations. We realized that we were not getting segments or clusters that were very distinct. One day, I was driving home in Austin, and I said, "You know what? Who they are I don’t control, but as far as what they're doing with HP we have a reasonably good understanding."

So we started to do clustering based only on those attributes, and that’s where an "aha" moment came. We started to find these clusters, which we call segments, where we eventually found a cluster which was that 7 to 8 percent of the population that brought in 45 percent of revenue.

The marketers started to say that this was a gold mine. That’s what we never expected to happen. We put together a structure. Once we figured out these four or five clusters, we tried to figure out why they were clustered together. What’s common?
Fully experience the HP Vertica analytics platform...
Become a member of myVertica
We built out a primary research thing, where we took a random sample out of each one of those clusters, interviewed those guys, and were able to build a very good profile of what these segments were.

There are 20 million SMBs in US, and we are able to build a model to predict which of these prospects are similar to the clusters we had. That’s where we were able to find customers that looked like our most profitable customers, which we ended up calling Vanguards. That resulted into a tremendous amount of  a dollar increment for HP. It's a good example of what you talked when you find unexpected things.

We just wanted to analyze data. It led us to a journey and ended up finding a customer group we weren't even aware of. Then, we could build marketing strategy to actually go target those and get some value out of it.

Gardner: At the Big Data Conference, I've spoken to other organizations who are creating an analytics capability and then exposing that to as many of their employees as possible, hoping for this very sort of unexpected positive benefit. Is there a way that you're taking your analytics either through visualization or tools and then allowing a larger population within HP to experiment with it?

Singh: We're trying to democratize the analytics as much as we can. One thing we're realizing is that to get the full value, you don't want data to stay in silos. So there are a couple of things you have to do. In terms of building out an ecosystem where you have good set of motivated people and where you can give them a career path, we have created this organization called Global Analytics. You get a critical mass of people who challenge each other, learn from each other, and do lot of analytics.

But also it’s very important that on the consumption side of it, you have people who are analysts and understand analytics and get the best value out of it. So they try to create that ecosystem. We have seen both ends of it.

Good career path

If you just give them to one data miner or analytics person in one team, sometimes the person does not find an ecosystem to challenge himself or herself. We're trying to do it on both sides of the fence, so that we can provide people with a good career path.

Hiring these folks is not easy. Once you've hired them, retaining them is not easy. You want to make sure to create an ecosystem where it’s challenging enough for these people to work. It also has to be an ecosystem where you continually challenge them and keep training them.

The analytical techniques are evolving. When I started doing it, things were stable for years. Now, the newer class of data is coming in, newer techniques are coming in, and newer classes of business problems are coming in. It’s very important that we keep the ecosystem going. So we try to do it on both sides.

Gardner: Very interesting. HP, of course, has its own line of products for big-data analysis. You're such a large global enterprise that you're doing lots of analysis, as any good business should, but you're also being asked to show how this works. Are there some specific use cases that demonstrate for other enterprises what you've learned yourselves.
You want to make sure to create an ecosystem where it’s challenging enough for these people to work. It also has to be an ecosystem where you continually challenge them and keep training them.

Singh: There are several that we can talk about. One is in a social media space. I briefly talked about that. My career evolved of doing analytics in what I call "data inside the enterprise." But, over the last couple of years, we started to go look at data outside the enterprise.

Recently we went and looked at a bank. We were able to harvest data from the Internet, publicly available data like Glassdoor, for example. Glassdoor is a website where employees of a company can put their feedback, talk about the company, and rate things.

We were presenting it to the executives of this particular bank and we were able to get all the data and tell them the overall employee morale. We figured out that the life-work balance for the employees wasn't very good.

The main component that the employees weren't happy about was their leave policy and their vacation policy. We drilled down and figured out that the bankers seemed to be fairly happy, but the IT guys and analysts weren't very happy. Again, this is one example where we didn't ask for a line of data from the customer. This data is publicly available. You and I, or anybody else, can go get it. I can do that same analysis for HP or any other company.

That’s where I believe the classes of analytics we're doing is changing. A lot of times, your competitive differentiator is the ability to do things with that data. Data is a corporate asset and it will be, but this class of what we call the user-generated data is changing analytics as a whole. The ability to go harvest it and, more importantly, get value out of it will be the competitive differentiator.

Gardner: Any other use cases that demonstrate the power of a particular type of platform, let’s say Vertica in HAVEn, where you've got the power of a columnar architecture and you've got the ability to bring in unstructured data from Autonomy? Maybe there are a couple of use cases that demonstrate the unique attributes of HAVEn when it comes to inclusivity and the comprehensive nature of information today?

Game changer

Singh: Let me talk about a couple of the things that happened in the HAVEn ecosystem. One of the main work forces in HAVEn is our massively parallel database called Vertica. In addition to being a database where we can ingest data very quickly, ingest large volumes of data, and run query performance, the game-changer for us as an analytics practitioner for me has been ability to do analytics in database.

If I look at my career over the last 20-22 years, most of the times what happens in the analytics space is that you have data residing in a database or an enterprise data warehouse. When you want to build a model, you take the data out and use an analytics platform like SAS, R, or SPSS. You do something there and you either bring the data back into the environment or you run the models and publish them out.

What Vertica has done that's unique is given us a framework, and through the UDEF framework, we could build a data mining model and run it directly on a database engine and take the output out.

An example we took to HP Discover a couple of months ago was trying to predict a failure of a machine before the actual failure happens. HP has these big machines and big printers, which are very expensive.

Like lot of high-end devices these days, they send out a lot of data. They send out data about when you're using a machine. The sensors send out a lot of information, maybe the pressure of the valves, the kind of the temperature they're in, the kind of throughput they're giving you, or the number of pages you've printed.
Looking at each components of failure, we could predict with a certain probability when the machine will fail and with a certain probability.

Also, they give you data on the events when the machine was not performing optimally or actually failed. We were able to go ingest all that data, put the data onto in the Vertica  platform, and build predictive models using open source R language. We built a model that can predict the failure of a machine.

Looking at each components of failure, we could predict with a certain probability when the machine will fail and with a certain probability, so our service reps can actually be proactive and not wait for the machine to fail. That's one example of doing an in-database data mining using Vertica.

Another example used more components around the social-media space. One of the problems in the social-media space, and I think you guys are probably familiar with this, is finding influencers.

I gave a talk yesterday around figuring out how you do that. There are classical ways if you go by the uni-dimensional thing around the number of followers or retweets you have. Barack Obama or Lady Gaga would be big influencers, but Barack Obama, for cloud computing for HP, may not be a very big influencer.

So you build those classes of algorithms. My team has actually built out three patented algorithms to figure out how to identify influencers in the space. We've actually built out a framework where we can source that data from the social-media space, drop it into a Hadoop kind of an environment.
Fully experience the HP Vertica analytics platform...
Become a member of myVertica
We use Autonomy to enrich and put some sentiments to it and then drop the data into the Vertica environment. In that Vertica environment, you run the compressed algorithms and get an output. Then, you can score and predict who is the influencer for the topic you are looking for.

Influencers

I gave the example of Barack Obama, in general a big influencer, but he is not influencer for all topics. Maybe in politics or the US government he's a big influencer, but not for cloud computing. Influencer is also a function of time. Somebody like Diego Maradona probably was a big influencer in soccer in the ’90s, but in 2014, not that much.

You have to make sure that you can incorporate those as part of the logic of your algorithm. We've been able to use the multiple components of HAVEn and build out a complete framework where we can tell numerically who the main influencers are and how influential they are. For example, if you get a score of 93 and I get a score of 22, you are almost four times as influential as I am.

Gardner: For other organizations that are interested in learning more about how HP Analytics is operating and maybe learning from your example, are there any resources or websites we can go to, where you are providing more information about HP Analytics?
You have to make sure that you can incorporate those as part of the logic of your algorithm.

Singh: Definitely. We work through our partners in Enterprise Services. We have our own website as well. There are multiple ways that you can approach us. You can talk to the Vertica sales team and they can connect to us. As I said, we do analytics for all of HP and for select customers. We do not have a direct sales arm to us. We work through our partners in Enterprise Services, as well as with software team.

Listen to the podcast. Find it on iTunes. Read a full transcript or download a copy. Sponsor: HP.

You may also be interested in:

Tuesday, November 11, 2014

Vichara Technologies grows the market for advanced analytics after cutting its big data teeth on Wall Street

The next BriefingsDirect deep-dive big data benefits case study interview explores how Vichara Technologies in Hoboken, New Jersey is expanding its capabilities in big data from origins on Wall Street into other areas, and thereby demonstrating the growing marketplace for advanced big-data analytics services.

The use of HP Vertica as a big data core component to Vichara has allowed them to extend their easier to use financial modeling and tools, and then apply them to other industries such as insurance and healthcare.

 Listen to the podcast. Find it on iTunes. Read a full transcript or download a copy.

To learn more about how advanced big data, cloud, and converged infrastructure implementations are expanding the impact and value of rapid and increasingly predictive analytics, BriefingsDirect sat down with Tim Meyer, Managing Director at Vichara Technologies at the recent HP Big Data 2014 Conference in Boston. The discussion is moderated by me, Dana Gardner, Principal Analyst at Interarbor Solutions.

Here are some excerpts:

Gardner: Tell us how your organization evolved, and how big data has become such a large part of the marketplace for gaining insights into businesses.

Meyer: The company has its roots in analytics and risk modeling and for all sorts of instruments that are used on Wall Street for predicting prices and valuation of instruments. As the IT infrastructure grew from Excel to databases and eventually to very fast databases, such as Vertica, we realized that there were many problems that couldn't be solved before, and that required way too long a time to answer.

Meyer
Wall Street people measure time in seconds, not in hours. We've found that there's a great value in answering a lot of business intelligence (BI) questions -- especially around valuations and risk models, as well as portfolio management. These are very large portfolios and datasets that have to be analyzed. We think that this is a great use of big-data analytics.

Gardner: How long have you been using Vertica? How did it become a part of your portfolio of services?
Fully experience the HP Vertica analytics platform...
Become a member of myVertica.
Meyer: We've been using Vertica for at least for two years now. It’s one of the early ones, and we recognized it as being one of the very fastest databases. We try to use as many of these components as possible. We really like Vertica for its capabilities.

Risk assessment

Gardner: Tim, this whole notion of risk assessment is of interest to me. I think it's coming to bear on more industries. People are also interested in extending from knowing what has happened to being able to predict, and then better prescribe new efforts and new insights.

Tell me about predictive risk assessment. How do you go about that, and what should other companies understand about that?

Meyer: Risk assessment comes about from starting to look at how prices fluctuate and how interest rates move, and thus create changes in derivatives. What has happened most recently is that a lot of the banks and hedge funds have recognized this. Not only is [predictive risk assessment] a business imperative for them to have that half-percent hedge, but there are also compliance reasons for which they need to predict what their business is going to look like.

There are now more and more demands on stress testing, as well as demands from international banking regulations, such as Basel III, that require that businesses such as hedge funds and banks not just look behind, but ahead at how their business is going to look in a year. So this becomes really very important for a host of reasons even more than just how your business is doing.

Gardner: If I were a business and wanted to start taking advantage of what's now available through big-data analytics -- and at a more compelling price and higher performance than in the past -- what are some of the first steps?
Fully experience the HP Vertica analytics platform...
Become a member of myVertica
Do I need to think about the type of data or the type of risk? How do you go about of recognizing that you can now get the technology to do this at an analytics level, but there is still the needed understanding of how to do it at the process and methodological level?

Meyer: We work very closely with our customers and try to separate algorithmic work from the development work. A lot of our customers have more than a few Caltech and MIT PhDs who do the algorithmic definitions. But all of them still need the engine, the machine with its scripting, and fast capability to build those queries right into the system as quickly as possible.

We usually work with these kinds of people, and it is a bit of a team-work effort. We find that that’s a way to figure out what is our value, and what is the value of our customer. Together, it has turned out to be very good teamwork.

Gardner: And you are a consultancy, as well as a services provider? Do you extend into any hosting or do you have a cloud approach? How do you manage the technology for the consulting and services you offer?

Broader questions

Meyer: We expand from the core products and tools into broader questions for people who want a proof of concept (POC) into this new technology. We build those on an ongoing basis. People, as well, want to look at options such as different performances of clouds. They do vary.

So we take on those kinds of consulting work as well, not to mention that sometimes it expands into back-office compliance and sometimes into billing issues. They all relate to the core business of managing portfolios, but yet they are linked.

Very often, we've done those kinds of projects and we see even more of these possibilities as we see compliance as a bigger issue, such as Dodd-Frank as well as Basel III, in the financial world. But they are really no different than many regulations coming on the healthcare side for paperwork management, for example.

Gardner: So that raises the question of the verticals that you expect first. Where is predictive risk assessment and the analytics requirements for that likely to appear first?
They all relate to the core business of managing portfolios, but yet they are linked

Meyer: One thing we have learned from our experience in financial modeling and tools is that there is always a need for people who are totally unskilled in SQL or other query languages to quickly get answers. Although many people have different takes on this, we think we've found some tools that are unique. And we think that these tools will apply to other industries, most particularly to healthcare.

These are big problems, but we think the way we think of it is to start small with a POC or really defining a very small problem and solving it and not trying to take a bite of the entire elephant, so to speak. We find that to be a much better approach to going into new segments and we'll be looking at both insurance and healthcare as two examples.
Fully experience the HP Vertica analytics platform...
Become a member of myVertica
Gardner: Back to the technology front. Are there any developments in the technology arena that give you more confidence that you can take on any number of data types, information types, and scale and velocity types?
I'm thinking of looking at either cloud or converged infrastructure support of in-memory or columnar architectures. Is there a sense of confidence that no matter what you go to bite off in the market, you have the technology, and the technology partner, to back you up?

Meyer: We're finding that there is much more maturity in a lot of database technologies that are now coming out.

There is always something new on the horizon, but there are, as you said, columnar architectures and so on. These are already here, and we're constantly experimenting with them.

To your point about cloud infrastructure and where that is going, it's the same thing. We see ParAccel, Amazon, and data warehouses such as Redshift showing us the way where a lot of the technology is becoming very prepackaged. The value-add is to talk to the customer and speed up that process of integration.

Listen to the podcast. Find it on iTunes. Read a full transcript or download a copy. Sponsor: HP.

You may also be interested in: