Tuesday, March 31, 2015

Novel consumer retail behavior analysis from InfoScout relies on HP Vertica big data chops

The next BriefingsDirect big data innovation case study interview highlights how InfoScout in San Francisco gleans new levels of accurate insights into retail buyer behavior by collecting data directly from consumers’ sales receipts.

In order to better analyze actual retail behaviors and patterns, InfoScout provides incentives for buyers to share their receipts, but InfoScout is then faced with the daunting task of managing and cleansing that essential data to provide actionable and understandable insights.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Read a full transcript or download a copy.

To learn more about how big -- and even messy -- data can be harnessed for near real time business analysis benefits, please join me in welcoming our guests, Tibor Mozes, Senior Vice President of Data Engineering, and Jared Schrieber, the Co-founder and CEO, both at InfoScout, based in San Francisco. The discussion is moderated by me, Dana Gardner, Principal Analyst at Interarbor Solutions.

Here are some excerpts:

Gardner: In your business you've been able to uniquely capture strong data, but you need to treat it a lot to use it and you also need a lot of that data in order to get good trend analysis. So the payback is that you get far better information on essential buyer behaviors, but you need a lot of technology to accomplish that.

Tell us why you wanted to get to this specific kind of data and then your novel way of acquiring.

Schrieber: A quick history lesson is in order. In the market research industry, consumer purchase panels have been around for about 50 years. They started with diaries in people’s homes, where they had to write down exactly every single product that they bought, day-in day-out, in this paper diary and mail it in once a month.

Schrieber
About 20 years ago, with the advent of modems in people’s homes, leading research firms like Nielsen would send a custom barcode scanner into people’s homes and ask them to scan each product they bought and then thumb into the custom scanner the regular price, the sales price, any coupons or deals that they got, and details about the overall shopping trip, and then transfer that electronically. That approach has not changed in the last 20 years.

With the advent of smartphones and mobile apps, we saw a totally new way to capture this information from consumers that would revolutionize how and why somebody would be willing to share their purchase information with a market research company.

Gardner: Interesting. What is it about mobile that is so different from the past, and why does that provide more quality data for your purposes?

Schrieber: There are two reasons in particular. The first is, instead of having consumers scan the barcode of each and every item they purchase and thumb in the pricing details, we're able to simply have them snap a picture of their shopping receipt. So instead of spending 20 minutes after a grocery shopping trip scanning every item and thumbing in the details, it now takes 15 seconds to simply open the app, snap a picture of the shopping receipt, and be done.

The second reason is why somebody would be willing to participate. Using smartphone apps we can create different experiences for different kinds of people with different reward structures that will incentivize them to do this activity.

For example, our Shoparoo app is a next-generation school fundraiser akin to Box Tops for Education. It allows people to shop anywhere, buy anything, take a picture of their receipt, and then we make an instant donation to their kid’s school every time.

Another app is more of a Tamagotchi game called Receipt Hog, where if you download the app, you have adopted a virtual runt. You feed it pictures of your receipt and it levels-up into a fat and happy hog, earning coins in a piggy bank along the way that you can then cash-out from at the end of the day.
Become a member of myVertica
Register now

Gain access to the HP Vertica Community Edition
These kinds of experiences are a lot more intrinsically and extrinsically rewarding to the panelists and have allowed us to grow a panel that’s many times larger than the next largest panel ever seen in the world, tracking consumer purchases on a day-in day-out basis.

Gardner: What is it that you can get from these new input approaches and incentivization through an app interface? Can you provide me some sort of measurement of an improved or increased amount of participation rates? How has this worked out?

Leaps and bounds

Schrieber: It's been phenomenal. In fact, our panel is still growing by leaps and bounds. We now have 200,000 people sharing with us their purchases on a day-in day-out basis. We capture 150,000 shopping trips a day. The next largest panel in America captures just 10,000 shopping trips a day.

In addition to the shopping trip data, we're capturing geolocation information, Facebook likes and interests from these people, demographic information, and more and more data associated with their mobile device and the email accounts that are connected to it.

Gardner: So yet another unanticipated consequence of the mobility trend that’s so important today.

Tibor, let’s go to you. The good news is that Jared has acquired this trove of information for you. The bad news is that now you have to make sense of it. It’s coming in, in some interesting ways, as almost a picture or an image in some cases, and at a great volume. So you have velocity, variability, and volume. So what does that mean for you as the Vice President of Data Engineering?

Mozes: Obviously this is a growing panel. It’s creating a growing volume of data that has created a massive data pipeline challenge for us over the years, and we had to engineer the pipeline so that is capable of processing this incoming data as quickly as possible.
It’s creating a growing volume of data that has created a massive data pipeline challenge for us over the years.

As you can imagine, our data pipeline has gone through an evolution. We started out with a simple solution at the beginning with MySQL and then we evolved it using Elastic Map Reduce and Hive.

But we felt that we wanted to create a data pipeline that’s much faster, so we can bring data to our customers much faster. That’s how we arrived at Vertica. We looked at different solutions and found Vertica a very suitable product for us, and that’s what we're using today.

Gardner: Walk me through the process, Tibor. How does this information come in, how do you gather it, and where does the data go? I understand you're using the HP Vertica platform as a cloud solution in the Amazon Web Services Cloud. Walk me through the process for the data lifecycle, if you will.

Mozes: We use AWS for all of our production infrastructure. Our users, as Jared mentioned, typically download one of our several apps, and after they complete a receipt scan from their grocery purchases, that receipt is immediately uploaded to our back-end infrastructure.

Mozes
We try to OCR that image of the receipt, and if we can’t, we use Amazon Mechanical Turk to try to make sense of the image and turn that image into text. At the end of the day, when an image is processed, we have a fairly clean version of that receipt in a text format.

In the next phase, we have to process the text and try to attribute various items on the receipt and make the data available in our Vertica data warehouse.

Then, our customers, using a business intelligence (BI) platform that we built especially for them, can analyze the data. The BI platform connects to Vertica, so our customers can analyze various metrics of our users and their shopping behavior.

Gardner: Jared, back to you. There's an awful lot of information on a receipt. It’s supposed to be very complex, given not just the date and the place and the type of retail organization, but all the different SKUs, every item that’s possibly being bought. How do you attack that sort of a data problem from a schema and cleansing and extract, transform, load (ETL) and then making it therefore useful?

Schrieber: It’s actually a huge challenge for us. It's quite complex, because every retailer’s receipt is different. The way that they structure the receipt, the level of specificity about the items on the receipt, the existence of product codes, whether they are public product codes like the kind of you see on a barcode for a soda product versus an internal product code that retailers use as a stock keeping unit internally versus just a short description on the receipt.

One of our challenges as a company is to figure out the algorithmic methods that allow us to identify what each one of those codes and short descriptions actually represent in terms of a real world product or category, so that we can make sense of that data on behalf of our client. That’s one of the real challenges associated with taking this receipt-based approach and turning that into useful data for our clients on a daily basis.
One of our challenges as a company is to figure out the algorithmic methods that allow us to identify what each one of those codes and short descriptions actually represent.

Gardner: I imagine this would be of interest to a lot of different types of information and data gathering. Not only are pure data formats and text formats being brought into the mix, as has been the case for many years, but this image-based approach, the non-structured approach.

Any lessons learned here in the retail space that you think will extend to other industries? Are we going to be seeing more and more of this image-based approach to analysis gathering?

Schrieber: We certainly are. As an example, just take Google Maps and Google Street View, where they're driving around in cars, capturing images of house and building numbers, and then associating that to the actual map data. That’s a very simple example.

A lot of the techniques that we're trying to apply in terms of making sense of short descriptions for products on receipts are akin to those being used to understand and perform social-media analytics. When somebody makes a tweet, you try to figure out what that tweet is actually about and means, with those abbreviated words and shortened character sets. It’s very, very similar types of natural language processing and regular expression algorithms that help us understand what these short descriptions for products actually mean on a receipt.

Gardner: So we've had some very substantial data complexity hurdles to overcome. Now we have also the basic blocking and tackling of data transport, warehouse, and processing platform.

Going back to Tibor, once you've applied your algorithms, sliced and diced this information, and made it into something you can apply to a typical data warehouse and BI environment, how did you overcome these issues about the volume and the complexity, especially now that we're dealing with a cloud infrastructure?

Compression algorithms

Mozes: One of the benefits of Vertica, as we went into the discovery process, was the compression algorithms that Vertica is using. Since we have a large volume of data to deal with and build analytics from, it has turned out to be beneficial for us that Vertica is capable of compressing data extremely well. As a result of that, some of our core queries that require a BI solution can be optimized to run super fast.

You also talked about the cloud solution, why we went into the cloud and what is the benefit of doing that. We really like running our entire data pipeline in AWS because it’s super easy to scale it up and down.

It’s easy for us to build a new Vertica cluster, if we need to evaluate something that’s not in production yet, and if the idea doesn’t work, then we can just pull it down. We can scale Vertica up, if we need to, in the cloud without having to deal with any sort of contractual issues.
Become a member of myVertica
Register now
Gain access to the HP Vertica Community Edition
Schrieber: To put this in context, now we're capturing three times as much data every day as we were six months ago. The queries that we're running against this have probably gone up 50X to a 100X in that time period as well. So when we talk about needing to scale this up quickly, that’s a prime example as to why.

Gardner: What has happened in just last six months that’s required that ramp up? Is it just because of the popularity of your model, the impactfulness and effectiveness of the mobile app acquisition model, or is it something else at work here?

Schrieber: It’s twofold. Our mobile apps have gotten more and more popular and we've had more and more consumers adopt them as a way to raise money for their kid’s school or earn money for themselves in a gamified way by submitting pictures of their receipts. So that’s driven massive growth in terms of the data we capture.

Also, our client base has more than tripled in that time period as well. These additional clients have greater demands of how to use and leverage this data. As those increase, our efforts to answer their business questions multiplies the number of queries that we are running against this data.

Gardner: That, to me, is a real proof point of this whole architectural approach. You've been able to grow by a factor of three in your client base in six months, but you haven’t gone back to them and said, "You'll have to wait for six months while we put in a warehouse, test it, and debug it." You've been able to just take that volume and ramp up. That’s very impressive.

Schrieber: I was just going to say, this is a core differentiator for us in the marketplace. The market research industry has to keep up with the pace of marketing, and that pace of marketing has shifted from months of lead time for TV and print advertising down to literally hours of lead time to be able to make a change to a digital advertising campaign, a social media campaign, or a search engine campaign.

So the pace of marketing has changed and the pace of market research has to keep up. Clients aren’t willing to wait for weeks, or even a week, for a data update anymore. They want to know today what happened yesterday in order to make changes on-the-fly.

Reports and visualization

Gardner: We've spoken about your novel approach to acquiring this data. We've talked about the importance of having the right platform and the right cloud architecture to both handle the volume as well as scale to a dynamic rapidly growing marketplace.

Let’s talk now about what you're able to do for your clients in terms of reports, visualization, frequency, and customization. What can you now do with this cloud-based Vertica engine and this incredibly valuable retail data in a near real-time environment for your clients?

Schrieber: A few things on the client side. Traditional market research providers of panel data have to put a very tight guardrails on how clients can access and run reports against the data. These queries are very complex. The numerators and denominators for every single record of the reports are different and can be changed on-the-fly.

If, all of a sudden, I want to look at anyone who shopped at Walmart in the last 12 months that has bought cat food in the last month and did so at a store other than Walmart, and I want to see their purchase behavior and how they shop across multiple retailers and categories, and I want to do that on-the-fly, that gets really complex. Traditional data warehousing and BI technologies don't support allowing general business-analyst users to be able to run those kinds of queries and reports on-demand, yet that’s exactly what they want.

They want to be able to ask those business questions and get answers. That’s been key to our strategy, which is to allow them to do so themselves, as opposed to coming back to them and saying, "That’s going to be a pretty big project. It will require a few of our engineers. We'll come back to you in a few weeks and see what we can do." Instead, we can hand them the tools directly in a guided workflow to allow them to do that literally on-the-fly and have answers in minutes versus weeks.
They want to be able to ask those business questions and get answers. That’s been key to our strategy.

Gardner: Tibor, how does that translate into the platform underneath? If you're allowing for a business analyst type of skill set to come in and apply their tools, rather than deep SQL queries or other more complex querying tools, what is it that you need from your platform in order to accommodate that type of report, that type of visualization, and the ability to bring a larger set of individuals into this analysis capability?

Mozes: Imagine that our BI platform can throw out very complex SQL queries. Our BI platform essentially is using, under the hood, a query engine that's going to run queries against Vertica. Because, as Jared mentioned, the questions are so complex, some of the queries that we run against Vertica are very different than your typical BI use cases. They're very specialized and very specific.

One of the reasons we went with Vertica is its ability to compute very complex queries at a very high speed. We look at Vertica not as simply another SQL database that scales very well and that’s very fast, but we also look at it as a compute engine.

So as part of our query engine, we are running certain queries and certain data transformations that would be very complicated to run outside Vertica.

We take advantage of the fact that you can create and run custom UDFs that is not part of the ANSI 99 SQL. We also take advantage some of the special functions that are built into Vertica allowing data to be sessionized very easily.

Analyzing behavior

Jared can talk about some of the use cases where we like to analyze user’s entire shopping trips. In order to do that, we have to stitch together different points in time that the user has gone through and shopped at various locations. And using some of the built –in functions in Vertica that’s not standard SQL, we can look at shopping journeys, we call them trip circuits, and analyze user behavior along the trip.

Gardner: Tibor, what other ways can you be using and exploiting the Vertica capabilities in the deliverables for your clients?

Mozes: Another reason we decided to go with Vertica is its ability to optimize very complex queries. As I mentioned, our BI platform is using a query engine under the hood. So if a user asks a very complicated business question, our BI platform turns that question into a very complicated query.

One of the big benefits of using Vertica is to be able to optimize these queries on the fly. It’s easy to do this with running the database optimizer to build custom projections, making queries running much faster than we could do before.
Another reason we decided to go with Vertica is its ability to optimize very complex queries.

Gardner: I always think more impactful for us to learn through an example rather than just hear you describe this. Do you have any specific InfoScout retail client use cases where you can describe how they've leveraged your solution and how some of these both technical and feature attributes have benefited them -- an example of someone using InfoScout and what it's done for them?

Schrieber: We worked with a major retailer this holiday season to track in real time what was happening for them on Thanksgiving Day and Black Friday. They wanted to understand their core shoppers, versus less loyal shoppers, versus non-core shoppers, how these people were shopping across retailers on Thanksgiving Day and Black Friday, so that the retailer could try to respond in more real time to the dynamics happening in the marketplace.

You have to look at what it takes to do that, for us to be able to get those receipts, process them, get them transcribed, get that data in, get the algorithms run to be able to map it to the brands and categories and then to calculate all kinds of metrics. The simplest ones are market share; the most complex ones have to do with what Tibor had mentioned: the shopper journey or the trip circuit.

We tried to understand, when this retailer was the shopper's first stop, what were they most likely to buy at that retailer, how much were they likely to spend, and how is that different than what they ended up buying and spending at other retailers that followed? How does that contrast to situations where that retailer was the second stop or the last stop of the day in that pivotal shopping day that is Black Friday?

For them to be able to understand where they were winning and losing among what kinds of shoppers who were looking for what kinds of products and deals was an immense advantage to them -- the likes of which they never had before.

Decision point

Gardner: This must be a very sizable decision point for them, right? This is going to help you decide where to build new retail outlets, for example, or how to structure the experience of the consumer walking through that particular brick-and-mortar environment.

When we bring this sort of analysis to bear, this isn’t refining at a modest level. This could be a major benefit to them in terms of how they strategize and grow. This could be something that really deeply impacts their bottom line. Is that not the case?

Schrieber: It has implications as to what kinds of categories they feature in their television, display advertising campaigns, and their circulars. It can influence how much space they give in their store to each one of the departments. It has enormous strategic implications, not just tactical day-to-day pricing decisions.

Gardner: Now, that was a retail example. I understand you also have clients that are interesting in seeing how a brand works across a variety of outlets or channels. Is there another example you can provide on somebody who is looking to understand a brand impact at a wider level across a geography for example?
It has enormous strategic implications, not just tactical day-to-day pricing decisions.

Schrieber: I'll give you another example that relates to this. A retailer and a brand were working together to understand why the brand sales were down at this particular retailer during the summer time. To make it clear for you, this is a brand of ice-cream. Ice cream sales should go up during the summer, during the warmer months, and the retailer couldn’t understand why their sales were underperforming for this brand during the summer.

To figure this out, we had to piece-together, along the shopper journey over time, not only in the weeks during the summer months, but year round to understand this dynamic of how they were shopping. What we were able to help the client quickly discover was that during the summer months people eat more ice-cream. If they eat more ice-cream, they're going to want larger pack sizes when they go and buy that ice-cream. This particular retailer tended to carry smaller pack sizes.

So when the summer months came around, even though people has been buying their ice-cream at this retailer in the winter and spring, they now wanted larger pack sizes and they were finding them at other retailers, and switching their spend over to these other retailers.

So for the brand, the opportunity was a selling story to the retailer to give the brand more freezer space and to carry an additional assortment of products to help drive greater sales for that brand, but also to help the retailer grow their ice cream category sales as well.

Idea of architecture

Gardner: So just that insight could really help them figure that out. They probably wouldn’t have been able to do it any other way.

We've seen some examples of how impactful this can be and how much a business can benefit from it. But let’s go back to the idea of the architecture. For me, one of my favorite truths in IT is that architecture is destiny. That seems to be the case with you, using the combination of AWS and HP Vertica.

It seems to me that you don’t have to suffer the costs of a large capital outlay of having your own data center and facilities. You're able to acquire these very advanced capabilities at a price point that's significantly less from a capital outlay and perhaps predictable and adjustable to the demand.

Is that something you then can pass along? Tell me a little bit about the economics of how this architectural approach works for you?

Mozes: One of the benefits of using AWS is that it’s very easy for us to adjust our infrastructure on demand, as we see fit. Jared has referred to some of the examples that we had before. We did a major analysis for a large retailer on Black Friday, and we had some special promotions to our mobile app users going on at that point. Imagine that our data volume would grow tremendously from one day to the next couple of days, and then after when the promotion is over and the big shopping season is over, our volume would come down somewhat.
It’s very cost efficient to run an operation where you can just add additional computing power as you need, and then when you don’t need that anymore, you can scale it down.

When you run an infrastructure in the cloud in combination with online data storage and data engine, it's very easy to scale it up and down. It’s very cost efficient to run an operation where you can just add additional computing power as you need, and then when you don’t need that anymore, you can scale it down.

We did this during a time period, when we had to bring a lot fresh data online quickly. We could just add additional nodes, and we saw very close to linear scalability by increasing our cluster size.

Schrieber: On the business side, the other advantage is we can manage our cash flows quite nicely. If you think about running a startup, cash is king, and not having to do large capital outlays in advance, but being able to adjust up and down with the fluctuations in our businesses, is also valuable.

Gardner: We're getting close to the end of our time. I wonder if you have any other insights into the business benefits from an analytics perspective of doing it this way. That is to say, incentivizing consumers, getting better data, being able to move that data and then analyze it at an on-demand infrastructure basis, and then deliver queries in whole new ways to a wider audience within your client-base.

I guess I'm looking for how this stands up both to the competitive landscape, but also to the past. How new and how innovative is this in marketing? Then we'll talk about where we go next? Let’s try to get a level set as to how new and how refreshing this is, given what the technology enables both at cloud basis and the mobility basis and then the core stuff, the underlying analytics platform basis.

Product launch

Schrieber: We have an example that's going on right now around a major new product launch for a very large consumer goods company. They chose us to help monitor this launch, because they were tired of waiting for six months for any insight in terms of who is buying it, how they were discovering it, how they came about choosing it over the competition, how their experience was with the product, and what it meant for their business.

So they chose to work with us for this major new brand launch, because we could offer them visibility within days or weeks of launching that new product in the market to help them understand who were the people who were buying, was it the target audience that they thought it was going to be, or was it a different demographic or lifestyle profile than they were expecting. If so, they might need to change their positioning or marketing tactics and targeting accordingly.

How are these people discovering the products? We're able to trigger surveys to them in the moment, right after they've made that purchase, and then flow that data back through to our clients to help them understand how these people are discovering it. Was it a TV advertisement? Was it discovered on the shelf or display in the store? Did a friend tell them about it? Was their social media marketing campaign working?
Often, hundreds of millions of dollars spent by major consumer goods companies on new brand launches to get this quick feedback in terms of what’s working and what’s not.

We're also able to figure out what these people were buying before. Were they new to this category of product? Or did they not use this kind of product before and were just giving it a try? Were they buying a different brand and have now switched over from that competitor? And, if so, how did they like it by comparison, and will they repeat purchase? Is this brand going to be successful? Is this meeting needs?

These are enormous decisions. Often, hundreds of millions of dollars spent by major consumer goods companies on new brand launches to get this quick feedback in terms of what’s working and what’s not, who to target with what kind of messaging, and what it’s doing to the marketplace in terms of stealing share from competitors.

Driving new people to the product category can influence major investment decisions along the lines of whether we need to build the new manufacturing facility, do we need to change our marketing campaigns, or should we go ahead and invest in that TV Super Bowl ad, because this really has a chance to go big?

These are massive decisions that these companies can now make in a timely manner, based on this new approach of capturing and making use of the data, instead of waiting six months on a new product launch. They're now waiting just weeks and are able to make the same kinds of decisions as a result.

Gardner: So, in a word it’s unprecedented. You really just haven’t been able to do this before.

Schrieber: It’s not been possible before at all, and I think that’s really what’s fueling the growth in our business.

Look to the future

Gardner: Let’s look to the future quickly. We hear a lot about the Internet of Things. We know that mobile is only partially through its evolution. We're going to see more smart phones in more hands doing more types of transactions around the globe. People will be using their phones for more of what we have thought of as traditional business in commerce. So that opens up a lot more information that’s generated and therefore need to gather and then analyze.

So where do we go next? How does this generate additional novel capabilities, and then where do we go perhaps in terms of verticals? We haven’t even talked about food or groceries, hospitality, or even health care.

So without going too far -- this could be another hour conversation in itself -- maybe we could just tease the listener and the reader with where the potential for this going forward is.

Schrieber: If you think about Internet of Things as it relates to our business, there are a couple of exciting developments. One is the use of things like beacons inside of stores. Now we can know exactly which aisle people have walked down and what shelf they’ve stood in front of, and what product they've interacted with. That beacon is communicating with their smartphone and that smartphone is tied to our user account in a way that we're surveying these individuals or triggering surveys to them, in-the-moment, as they shop.
That will open up entirely new fields of research and consumer understanding about how people shop and make decisions at the shelf.

That’s not something that’s been doable before. It’s something that the Internet of Things, and very specifically beacons linking with smartphones, will allow us to do going forward. That will open up entirely new fields of research and consumer understanding about how people shop and make decisions at the shelf.

The same is true inside the home. We talk about the Internet of Things as it relates to smart refrigerators or smart laundry machines, etc. Understanding daily lifestyle activities and how people make the choice of which product to use and how to use them inside their home is a field of research that is under-served today. The Internet of Things is really going to open up in the years to come.

Gardner: Just quickly, what are other retail sectors or vertical industries where this would make a great deal of sense.
Become a member of myVertica
Register now
Gain access to the HP Vertica Community Edition
Schrieber: I have a friend who runs an amazing business called Wavemark, which is basically an Internet of Things for medical devices and medical consumables inside of hospitals and care facilities, with the ability to track inventory in real time, tying it to patients and procedures, tying it back to billing and consumption.
Making all of that data available to the medical device manufacturers, so that they can understand how and when their products are being used in the real world in practice, is revolutionizing that industry. We're seeing it in healthcare, and I think we're going to see it across every industry.

Engineering perspective

Gardner: Last word to you, Tibor. Given what Jared just told us about the greater applicability. The model, the architecture comes back to mind for me, the cloud, the mobile device, the data, the engine, the ability to deal with that velocity, volume, and variability at a cost point that is doable and scales up and down. Are there any thoughts about this from an engineering perspective and where we go next?

Mozes: We see that with all these opportunities bubbling up, the amount of data that we have to process on a daily basis is just going to continually grow at an exponential rate. We continue to get additional information on shopping behavior and more data from external data sources. Our data is just going to grow. We will need to engineer everything to be as scalable as possible.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Read a full transcript or download a copy. Sponsor: HP.

You may also be interested in:

Wednesday, March 25, 2015

IT operations modernization helps energy powerhouse Exelon acquire businesses

This next BriefingsDirect IT innovation discussion examines how Exelon Corporation, based in Chicago, employs technology and process improvements to not only optimize their IT operations but also to both help manage a merger and acquisition transition, and to bring outsourced IT operations back in-house.

Listen to the podcast. Find it on iTunes. Read a full transcript or download a copy.

To learn more about how this leading energy provider in the US, with a family of companies having $23.5 billion in annual revenue, accomplishes these goals we're joined by Jason Thomas, Manager of Service, Asset and Release Management at Exelon. The discussion is moderated by me, Dana Gardner, Principal Analyst at Interarbor Solutions.

Here are some excerpts:

Gardner: I gave a brief overview of Exelon, but tell us a little bit more. It's quite a large organization that you're involved with.

Thomas: We are vast and expansive. We have a large nuclear fleet, around 40-odd nuclear power plants in three utilities, ComEd in Chicago, in the Illinois space; PECO out of Philadelphia; and BG and E in Baltimore.

So we have large urban utilities center. We also have a large retail presence with the Constellation brand and the sale of power both to corporations and to users. So there's a lot of that we do obviously in the utility space, and there are some element of the trade, the commodity trading side, as well in trading power in these markets.

Gardner: I imagine it must be quite a large IT organization to support all that?

Thomas: There are 1,200 to 1,300 IT employees across the company.
Reap the rewards of software compliance
Get the HP Toolkit
For Optimized Software Licensing
Gardner: Tell us about some of the challenges that you've been facing in managing your IT operations and making them more efficient. And, of course, we'd like to hear more about the merger between Constellation and Exelon back in 2012.

Merger is a challenge

Thomas: The biggest challenge is the merger. Obviously, our scale and the number of, for lack of a better word, things that we had to monitor, be aware of, and know about vastly increased. So we had to address that.

Thomas
A lot of our efforts around the merger and post-merger were around bringing everything into one standard monitoring platform, extending that monitoring out, leveraging the Business Service Management (BSM) suite of products, leveraging Universal Configuration Management Database (UCMDB).

Then there was a lot around consolidating asset management. In early 2013, we moved to Asset Manager as our asset manager platform of choice, consolidating data from Exelon from their tool, the Cergus CA Argis tool, into Asset Manager in support of moving to new IT billing that would be driven out of the data and Asset Manager in leveraging some of the executive scorecard and financial manager pieces to make that happen.

There was also a large effort through 2013 to move the company to a standardized platform to support our service desk, incident management, and also our service catalog for end-users. But a lot of this was driven last year around the in-sourcing of our relationship with Computer Sciences Corporation for our IT operations.

This was to basically realize a savings to the company of $12 to $15 million annually from the management of that contract, and also to move both the management and the expertise in house and leverage a lot of the processes that we built up and that had grown through the company as a whole.

Gardner: So knowing yourself well in terms of your IT infrastructure and all the elements of that is super important, and then bringing in-sourcing transition to the picture, involves quite a bit of complexity.
You've leveled the playing field and you have that common set of tools that you're going to drive to take you to the next level.

What do you get when you do this well? Is there a sense of better control, better security, or culture? What is it that rises to the top of your mind when you know that you have your IT service management (ITSM) in order, when you have your assets and configuration management data in order. Is it sleeping better at night? Is it a sense of destiny you have fulfilled -- or what?

Thomas: Sleeping better at night. There is an element of that, but there's also sometimes the aspect of, "Now what's next?" So, part of it is that there's an evolutionary aspect too. We've gotten everything in one place. We're leveraging some of the integrations, but then what’s next?

It's more restful. It's now deciding how we better position ourselves to show the value of these platforms. Obviously, there's a clear monetary value of what we did to in-source, but now how do we show the business the value that we have done? Moving to a common set of tools helps to get there. You've leveled the playing field and you have that common set of tools that you're going to drive to take you to the next level.

Gardner: What might that next level be? Is it a cloud transition? Is it more of a hybrid sourcing for IT? Is this enabling you to take advantage of the different devices in terms of mobile? Where does it go?

Automation and cloud

Thomas: A lot of it is really around automation, the intermediate step around cloud. We've looked at cloud. We do have areas where the company has leveraged it. IT is still trying to wrap their heads around how we do it, and then also how we expose that to the rest of the organization.

But the steps we’ve done around automation are very key in making leaner operations, IT operations, but also being able to do things in an automated fashion, as opposed to requiring the manual elements that, in some cases, we had never done prior to the merger.

Gardner: Any examples? You mentioned $15 million in savings, but are there any other metrics of success or key performance indicator (KPI)-level paybacks that you can point to in terms of having all this in place for managing and understanding your IT?

Thomas: We're still going through what it is we're going to measure and present. There's been a standard set of things that we've measured around our availability and our incidents and whether these incidents are caused by IT, by infrastructure.
One of the key things is how you're changing and how you do IT operations.

We've done a lot better operationally. Now it's taking some of those operational aspects and making them a little bit more business-centric. So for the KPIs, we're going through that process of determining what we're going to measure ourselves against.

Gardner: Jason, having gone through quite a big and complex undertaking in getting your ITSM and Application Lifecycle Management (ALM) activities, what comnes next? Maybe a merger and acquisition is going to push you in a new direction.

Thomas: We recently announced the intent to acquire Pepco Holdings, which is the regional utility in Washington, DC area, that further widens our footprint in the mid-Atlantic area. So yeah, we get to do it all over again with a new partner, bringing Pepco in and doing some elements of this again.

Gardner: Having gone through this and anticipating yet another wave, what words of wisdom might you provide in hindsight for those who are embarking on a more automated, streamlined, and modern approach to IT operations?
Reap the rewards of software compliance
Get the HP Toolkit
For Optimized Software Licensing
Thomas: One of the key things is how you're changing and how you do IT operations. Moving towards automation, tools aside, there's a lot of organizational change if you're changing how people do what they do or changing people's jobs or the perception of that.

You need to be clear. You need to clearly communicate, but you also need to make sure that you have the appropriate support and backing from leadership and that the top-down communication is the same message. We certainly had that, and it was great, but there's alway going to be that challenge of making sure everybody is getting that communication, getting the message, and getting constant reinforcement of that.

Organizational changes resulting from a large merger or acquisition are huge. It's key to show the benefits, even to the people who are obviously going to reap some of these immediate benefits,  those in IT. You know the business is going to see some. It's couching that value in the means or method appropriate for those actors, all of those stakeholders.

Full circle

Gardner: Of course, you have mentioned working through a KPI definition and working the executive scorecard. That makes if full circle, doesn’t it?

Thomas: Defining those KPIs, but also having one place where those KPIs can be viewed, seen easily, and drilled into is big. To date, it's been a challenge to provide some of that historiography around that data. Now, you have something where you can even more readily drill into it to see that data -- and that’s huge.

Presenting that, being able to show it, and being able to show it in a way that people can see it easily, is huge, as opposed to just saying, "Well, here's the spreadsheet with some graphs" or "Here’s a whiz-bang PowerPoint doc."

Gardner: And, Jason, I suppose this points to the fact that IT is really maturing. Compared to other business services and functions in corporations, things that had been evolving for 80 or 100 years, IT is, in a sense, catching up.
Now, you have something where you can even more readily drill into it to see that data -- and that’s huge.

Thomas: It's catching up, but I also think it's more of a reflection. It's reflection of a lot of the themes of the new style of IT. A lot of that is that consumerization aspect. In fact,  if you look at the last 10 years ago, the wide presence of all of these, your smart devices and your smartphones, is huge.

We have brought to most people something that was never easily accessible. And having to take that same aspect and make it part of how you present what you do in IT is huge. You see it in how you're manifesting it in your various service catalogs and some of the efforts that we're undertaking to refine and better the processes that underlie our technical service catalog to have a better presentation layer.

That technical service catalog will refer to what we've seen with Propel. It's an easier, nicer, friendlier way to interact, and people expect that. Why can’t this be more like my app store, or why can't this be more like X.

Is IT catching up or has IT become more reachable, has become more warm and fuzzy as opposed to something that’s cold, hard, and stored away somewhere? You kind of know about it, and perhaps the guys in the basement are the ones who are doing all the heavy lifting, and it's more tangible.

Gardner: Humanization of IT, perhaps.

Thomas: Absolutely.

Gardner: All right, one last area I want to get into before we sign off. We've heard quite a bit  about The Machine, HP unveiling more detail from its labs activities. It’s not necessarily on a product roadmap yet, but it’s described through a lower footprint, much more rapid ability to join compute and memory, and then  reduce the size of the data center down to a size of a refrigerator.

I know that it's on the horizon, but how does that strike you, and how interesting is that for you?

Ramp up/ramp down

Thomas: It's interesting, because it allows you to get to bit more ability to ramp up or ramp down, based on what you need, as opposed to you having x amount of servers and x amount of storage that's always somewhere. It gives you a lot more flexibility and, to some extent, gives you a bit more tenability. It's directly applicable to certain aspects of the business, where you need that capability to ramp up and ramp down much more easily.
Reap the rewards of software compliance
Get the HP Toolkit
For Optimized Software Licensing
I had a conversation with one of my peers about that. We were talking about how both that and the Moonshot aspect and the ability to have that for a lot of the customer-facing websites, and the ability to tie them, in particular, the utility customer-facing websites whose utilization tends to spike during weather events.

While they don't spike all at the same time, there is the potential opportunity in the Mid-Atlantic of all the utilities spiking at the same time around a hurricane or Sandy-esque event. There's obviously a need to able to respond to that kind of demand, and that technology positions you with the flexibility to do that rather quickly and easily.

Listen to the podcast. Find it on iTunes. Read a full transcript or download a copy. Sponsor: HP.

You may also be interested in:


Thursday, March 19, 2015

Axeda's machine cloud produces on-demand IoT analysis services

This BriefingsDirect big data innovation discussion examines how Axeda, based in Foxboro, Mass., has created a machine-to-machine (M2M) capability for analysis -- in other words, an Axeda Machine Cloud for the Internet of Things (IoT).

Listen to the podcast. Find it on iTunes. Read a full transcript or download a copy.

To learn more about how Axeda produces streams of massive data to multiple consumer dashboards that analyze business issues in near-real-time, we're joined by Kevin Holbrook, Senior Director of Advance Development at Axeda. The discussion is moderated by me, Dana Gardner, Principal Analyst at Interarbor Solutions.

Here are some excerpts:

Gardner: We have the whole Internet of Things (IoT) phenomenon. People are accepting more and more devices, end points, sensors, even things within the human body, delivering data out to applications and data pools. What do you do in terms of helping organizations start to come to grip with this M2M and IoT data demand?
Become a member of myVertica
Register now
Gain access to the HP Vertica Community Edition
Holbrook: It starts with the connectivity space. Our focus has largely been in OEMs, equipment manufacturers. These are people who have the "M" in the M2M or the "T" in the Internet of Things. They are manufacturing things.

The initial drivers to have a handle on those things are basic questions, such as, "Is this device on?" There are multi-million dollar machines that are currently deployed in the world where that question can’t be answered without a phone call.

Initial driver

That was the initial driver, the seed, if you will. We entered into that space from the remote-service angle. We deployed small-agent software to the edge to get the first measurements from those systems and get them pushed up to the cloud, so that users can interact with it.

Holbrook
That grew into remote accesstelnet sessions or remote desktop being able to physically get down there, debug, tweak, and look at the devices that are operating. From there, we grew into software distribution, or content distribution. That could be anything from firmware updates to physically distributing configuration and calibration files for the instrument. We're recently seeing an uptake in content distribution for things like digital signage or in-situ ads being displayed on consumer goods.

From there, we started aggregating data. We have about 1.5 million assets connected to our cloud now globally, and there is all kinds of data coming in. Some of it's very, very basic from a resource standpoint, looking at CPU consumption, disks space, available memory, things of that nature.

It goes all the way through to usage and diagnostics, so that you can get a very granular impression how this machine is operating. As you begin to aggregate this data, all sorts of challenges come out of it. HP has proven to be a great partner for starting to extract value.

We can certainly get to the data, we can connect the device, and we can aggregate that data to our partners or to the customer directly. Getting value from that data is a completely different proposition. Data for data’s sake is not high value.
From our perspective, Vertica represents an endpoint. We've carried the data, cared for the data, and made sure that the device was online, generating the right information and getting it into Vertica.

Gardner:  What is it that you're using Vertica for to do that? Are we creating applications, are we giving analysis as a service? How is this going to market for you?

Holbrook: From our perspective, Vertica represents an endpoint. We've carried the data, cared for the data, and made sure that the device was online, generating the right information and getting it into Vertica.

When we approach customers, were approaching it from a joint-sale perspective. We're the connectivity layer, the instrumentation, the business automation layer there, and we're getting it into Vertica ,so that can be the seed for applications for business intelligence (BI) and for analytics.

So, we are the lowest component in the stack when we walk into one of these engagements with Vertica. Then, it's up to them, on a customer-by-customer basis, to determine what applications to bring to the table. A lot of that is defined by the group within the organization that actually manages connectivity.

We find that there's a big difference between a service organization, which is focused primarily on keeping things up and running, versus a business unit that’s driving utilization metrics, trying to determine not only how things are used, but how it can influence their billing.

Business use

We've found that that's a place where Vertica has actually been quite a pop for us in talking to customers. They want to know not just the simple metrics of the machines' operation, but how that reflects the business use of it.

The entire market has shifted and continues to shift. I was somewhat taken aback only a couple of weeks ago, when I found out that you can no longer buy a jet engine. I thought this was a piece of hardware you purchased, as opposed to something that you may have rented and paid per use. And so [the model changes to leasing] as the machines get  bigger and bigger. We have GE and the Bureau of Engraving and Printing as customers.

We certainly have some very large machines connected to our cloud and we're finding that these folks are shifting away from the notion that one owns a machine and consumes it until it breaks or dies. Instead, one engages in an ongoing service model, in which you're paying for the use of that machine.

While we can generate that data and provide some degree of visibility and insight into that data, it takes a massive analytics platform to really get the granular patterns that would drive business decisions.

Gardner: It sounds like many of your customers have used this for some basic blocking and tackling about inventory and access and control, then moved up to a business metrics of how is it being used, how we're billing, audit trails, and that sort of thing. Now, we're starting to look at a whole new type of economy. It's a services economy, based on cloud interactivity, where we can give granular insights, and they can manage their business very, very tightly.
There's not only a ton of data being generated, but the regulatory and compliance requirements which dictate where you can even leave that data at rest.

Any thoughts about what's going to be required of your organization to maintain scale? The more use cases and the more success, of course, the more demand for larger data and even better analytics. How do you make sure that you don't run out of runway on this?

Holbrook: There are a couple of strategies we've taken, but before I dive into that, I'll say that the issue is further complicated by the issue of data homing. There's not only a ton of data being generated, but the regulatory and compliance requirements which dictate where you can even leave that data at rest. Just moving it around is one problem, and where it sits on a disk is a totally different problem. So we're trying to tackle all of these.

The first way to address the scale for us from an architectural perspective was to try to distribute the connectivity. In order for you to know that something's running, you need to hear from it. You might be able to reach out, what we call contactability, to say, "Tell me if you're still running." But, by and large, you know of a machine's existence and its operation by virtue of it telling you something. So even if a message is nothing more than "Hello, I'm here," you need to hear from this device.

From the connectivity standpoint, our goal is not to try to funnel all of this into a single pipe, but rather to find where to get a point of presence that is closest and that is reasonable. We’ve been doing this on our remote-access technology for years, trying to find the appropriate geographically distributed location to route data through, to provide as easy and seamless an experience as possible.

So that’s the first, as opposed to just ruthlessly federating all incoming data, distributing the connectivity infrastructure, as well as trying to get that data routed to its end consumer as quickly as possible.

We break down data from our perspective into three basic temporal categories. There's the current data, which is the value you would see reading a dial on the machine. There's recent data, which would tell you whether something is trending in a negative direction, say pressure going up. Then, there is the longer-term historical data. While we focus on the first two, we’d deliberately, to handle the scale problem, don't focus on the long-term historical data.

Recent data

I'll treat recent data as being anywhere from 7 to 120 days and beyond, depending on the data aggregation rates. We focus primarily on that. When you start to scale beyond that, where the real long tail of this is, we try to make sure that we have our partner in place to receive the data.

We don't want to be diving into two years of data to determine seasonal trending when we're attempting to collect data from 1.5 million assets and acting as quickly as possible to respond to error conditions at the edge.

Gardner: Kevin, what about the issue of latency? I imagine some of your customers have a very dire need to get analysis very rapidly on an ongoing streamed basis. Others might be more willing to wait and do it in a batch approach in terms of their analytics. How do you manage that, and what are some of the speeds and feeds about the best latency outcomes?

Holbrook: That’s a fantastic question. Everybody comes in and says we need a zero-latency solution. Of course, it took them about two-and-a-half seconds to say that.
Become a member of myVertica
Register now
Gain access to the HP Vertica Community Edition
There's no such thing as real-time, certainly on the Internet. Just negotiating up the TCP stack and tearing it down to send one byte is going to take you time. Then, we send it over wires under the ocean, bounce it off a satellite, you name it. That's going to take time.

There are two components to it. One is accepting that near-real-time, which is effectively the transport latency, is the smallest amount of time it can take to physically go from point A to point B, absent having a dedicated fiber line from one location to the other. We can assume that on the Internet that's domestically somewhere in the one- to two-second range. Internationally, it's in the two- to three-second or beyond range, depending on the connectivity of the destination.

What we provide is an ability to produce real-time streams of data outbound. You could take from one asset, break up the information it generates, and stream it to multiple consumers in near-real-time in order to get the dashboard in the control center to properly reflect the state of the business. Or you can push it to a data warehouse in the back end, where it then can be chunked and ETLd into some other analytics tool.

For us, we try not to do the batch ETLing. We'd rather make sure that we handle what we're good at. We're fantastic at remote service, at automating responses, at connectivity and at expanding what we do. But we're never going to be a massive ETL, transforming and converting into somebody’s data model or trying to get deep analytics as a result of that.

Gardner: Was it part of this need for latency, familiarity, and agility that led into Vertica? What were some of the decisions that led to picking Vertica as a partner?

Several reasons

Holbrook: There were a few reasons. That was one of them. Also the fact that there's a massive set of offerings already on top of it. A lot of the other people when we considered this -- and I won't mention competitors that we looked at -- were more just a piece of the stack, as opposed to a place where solutions grew out of.

It wasn't just Vertica, but the ecosystem built on top of Vertica. Some of the vendors we looked at are currently in the partner zone, because they're now building their solutions on top of Vertica.

We looked at it as an entry point into an ecosystem and certainly the in-memory component, the fact that you're getting no disk reads for massive datasets was very attractive for us. We don’t want to go through that process. We've dealt with the struggles internally of trying to have a relational data model scale. That’s something that Vertica has absolutely solved.

Gardner: Now your platform includes application services, integration framework, and data management. Let’s hone in on the application services. How are developers interested in getting access to this? What are their demands in terms of being able to use analysis outcomes, outputs, and then bring that into an application environment that they need to fulfill their requirements to their users?
It wasn't just Vertica, but the ecosystem built on top of Vertica. Some of the vendors we looked at are currently in the partner zone, because they're now building their solutions on top of Vertica.

Holbrook: It breaks them down into two basic categories. The first is the aggregation and the collection of data, and the second is physical interaction with the device. So we focus on both about equally. When we look at what developers are doing, almost always it’s transforming the data coming in and reaching out to things like a customer relationship management (CRM) system. It's opening a ticket when a device has thrown a certain error code or integrating with a backend drop-ship distribution system in the event that some consumable has begun to run low.

In terms of interaction, it's been significant. On the data side, we primarily see that they're  extracting subsets of data for deeper analysis. Sometimes, this comes up in discrete data points. Frequently, this comes up in the transfer of files. So there is a certain granularity that you can survive. Coming down the fire-hose is discrete data points that you can react to, and there's a whole other order of magnitude of data that you can handle when it's shipped up in a bulk chunk.

A good example is one of the use cases we have with GE in their oil and gas division  where they have a certain flow of data that's always ongoing and giving key performance indicators (KPIs). But this is nowhere near the level of data that they're actually collecting. They have database servers that are co-resident with these massive gas pipeline generators.

So we provide them the vehicle for that granular data. Then, when a problem is detected automatically, they can say, "Give me far more granular data for the problem area." it could be five minutes before or five minutes since. This is then uploaded, and we hand off to somewhere else.

So when we find developers doing integration around the data in particular, it's usually when they're diving in more deeply based on some sort of threshold or trigger that has been encountered in the field.
Become a member of myVertica
Register now
Gain access to the HP Vertica Community Edition
Gardner: And lastly, Kevin, for other organizations that are looking to create data services and something like your Axeda Machine Cloud, are there any lessons learned that you could share when it comes to managing such complexity, scale, and the need for speed? What have you learned at a high level that you could share?

All about strategy

Holbrook: It’s all going to be about the data-collection strategy. You're going to walk into a customer or potential customer, and their default response is going to be, "Collect everything." That’s not inherently valuable. Just because you've collected it, doesn’t mean that you are going to get value from it. We find that, oftentimes, 90-95 percent of the data collected in the initial deployment is not used in any constructive way.

I would say focus on the data collection strategy. Scale of bad data is scale for scale’s sake. It doesn’t drive business value. Make sure that the folks who are actually going to be doing the analytics are in the room when you are doing your data collection strategy definition. when you're talking to the folks who are going to wire up sensors,  and when you're talking to the folks who are building the device.

Unfortunately, these are frequently within a larger business ,in particular, completely different groups of people that might report to completely different vice presidents. So you go to one group, and they have the connectivity guys. You talk about it and you wire everything up.
We find that, oftentimes, 90-95 percent of the data collected in the initial deployment is not used in any constructive way.

Then, six to eight months later, you walk into another room. They’ll say "What the heck is this? I can’t do anything with this. All I ever needed to know was the following metric." It wasn’t collected because the two hadn't stayed in touch. The success of deployed solutions and the reaction to scale challenges is going to be driven directly by that data-collection strategy. Invest the time upfront and then you'll have a much better experience in the back.

Listen to the podcast. Find it on iTunes. Read a full transcript or download a copy. Sponsor: HP.

 You may also be interested in: