Tuesday, August 18, 2015

The future of business intelligence as a service with GoodData and HP Vertica

The next BriefingsDirect big data innovation case study interview highlights how GoodData expands the realms and possibilities for delivering business intelligence (BI) and data warehousing as a service. We'll learn how they're exploring new technologies to make that more seamless across more data types for more types of users -- all in the cloud.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Read a full transcript or download a copy.

To learn the ups and downs of BIaaS, we welcome Jeff Morris, Vice President of Marketing at GoodData in San Francisco, and Chris Selland, Vice President for Business Development at HP Vertica. The discussion is moderated by me, Dana Gardner, Principal Analyst at Interarbor Solutions.

Here are some excerpts:

Gardner: Tell us about GoodData, what you do, and why it's different.

Morris: GoodData is an analytics platform as a service (PaaS). We cover the full spectrum end-to-end use case of creating an analytic infrastructure as a service and delivering that to our customers.

https://www.linkedin.com/profile/view?id=269795&authType=OPENLINK&authToken=yu9i&locale=en_US&srchid=2156023941439220736231&srchindex=26&srchtotal=1029&trk=vsrp_people_res_name&trkInfo=VSRPsearchId%3A2156023941439220736231%2CVSRPtargetId%3A269795%2CVSRPcmpt%3Aprimary%2CVSRPnm%3Atrue%2CauthType%3AOPENLINK
Morris
We take on the challenges of collecting the data, whatever it is, structured and unstructured. We use a variety of technologies as appropriate, as we do that. We warehouse it in our multitenant, massively scalable data warehouse that happens to be powered by HP Vertica.

We then combine and integrate it into whatever the customer’s particular key performance indicators (KPIs) are. We present that in aggregate in our extensible analytics engine and then present it to the end users through desired dashboards, reports, or discoverable analytics.

Our business is set up such that about half of our business operates on an internal use case, typically a sales and marketing and social analytic kind of use case. The other half of our business, we call "Powered by GoodData." and those customers are embedding the GoodData technology in their own products. So we have a number of companies creating these customer-facing data products that ultimately generate new streams of revenue for their business.

40,000 customers

We've been at this since 2007. We're serving about 40,000 customers at this point and enjoying somewhere around 2.4 million data uploads a week. We've built out the service such that it's massively scalable. We deliver incredibly fast time to market. Last quarter, about two thirds of our deployments were delivered within 16 weeks or less.

One of the divisions of HP, in fact, deployed GoodData in less than six weeks. They are giving their first set of KPIs and delivering that value to them. What’s making us different in the marketplace right now is that we're eliminating all of the headaches associated with creating your own big data lake-style BI infrastructure and environment.

What we end up doing is affording you the time to focus on the analytics and the results that you gain from them—without having to manage the back-end operations.

Gardner: You're creating analytic applications on datasets that are easily contributed to your platform.
Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition
Morris: Yes, indeed. The datasets themselves also tend to be born in the cloud. As I said, the types of applications that we're building typically focus on sales and marketing and social, and e-commerce related data, all of which are very, very popular, cloud-based data sources. And you can imagine they're growing like crazy.
We see a leaning in our customer base of integrating some on-premise information, typically from their legacy systems, and then marrying that up with the Salesforce, or the market data or social information that they want to integrate and build a full view of their customers -- or a full exposure of what their own applications are doing.
What we end up doing is affording you the time to focus on the analytics and the results that you gain from them—without having to manage the backend operations.

Gardner: So you're providing an excellent example of how HP Vertica forms a cloud-borne analytics platform. Are any of your clients doing this both on-premises and taking advantage of what the cloud does best? Are we now on the vanguard of hybrid BI?

Morris: We're getting there, and there are certainly some industries are more cloud friendly than others right now. Interestingly, the healthcare space is starting to, but they're still nascent. The financial services industry is still nascent. They're very protective of their information. But retailers, e-commerce organizations, technology ISVs, and digital media agencies have adopted the cloud-based model very aggressively.

We're seeing a terrific growth and expansion there and we do see use cases right now where we're beginning to park the cloud-based environment alongside your more traditional analytics environments to create that hybrid effect. Often, those customers are recognizing that the speed at which data is growing in the cloud is driving them to look for a solution like ours.

Gardner: Chris, how unique is GoodData in terms of being all cloud moving toward hybrid?

Special relationship

Selland: GoodData is certainly a very special partner and a very special relationship for us. As you said, Vertica is fundamentally a software platform that was purpose-built for big data that is absolutely cloud-enabled. But GoodData is the best representation of the partner who has taken our platform and then rolled out service offerings that are specifically designed to solve specific problems. It's also very flexible and adaptable.

Selland
So, it’s a special partnership and relationship. It's a great proof point for the fact that the HP Vertica platform absolutely was designed to be running in the cloud for those customers who want to do it.

As Jeff said, though, it really varies greatly by industry. A large majority of the customers in our customer advisory board (CAB), which tend to be some of our largest customers and some pretty well-known industries, were saying how they will never put their data in the cloud.

Never is a very long time, but at the same time, there are other industries that are adopting it very rapidly. So there is a rate of change that’s going on in the industry. It varies by size of company, by the type of competitive environment, and by the type of data. And yes, there is a lot of hybridization going on out there. We're seeing more of the hybridization in existing organizations that are migrating to the cloud. There's a lot of new breed companies who started in the cloud and have every intent of staying there.

But there's a lot of dynamism in this industry, a lot of change, and this is a partnership that is a true win-win. As I said, it's a very special relationship for both companies.

Gardner: There's more than just HP Vertica. There's HP Haven, which includes Hadoop, Autonomy, security and applications. Is there a path that you see whereby you can try to be as many things to as many types of customer and vertical industries?

Morris: Absolutely. The HP Haven-style architecture is a vision in a direction that we are going. We do use Hadoop right now for special use cases of expanding and providing structure, creating structure out of unstructured information for a number of our customers, and then moving that into our Vertica-based warehouse.

The beauty of Vertica in the cloud is the way we have set this up and this also helps address both the security and the reliability issues that might be a thought of as issues in the cloud. We're triple clustering each set of instances of our vertical warehouses, so they are always reliable and redundant.

Daily updates

We, like the biggest enterprises out there, are vigilantly maintaining our network. We update our network on behalf of our customers on a daily basis, as necessary. We roll out and maintain a very standardized operating environment, including an open stack-based operating environment, so that customers never need to even care about what versions of the SSL libraries exist or what versions of the VPN exist.

We're taking care of all of that really deep networking and things that the most stalwart enterprise-style IT architects are concerned about. We have to do that, too, and we have to do it at scale for this multi-tenant kind of use-case.

As I said, the architecture itself is very Haven-like, it just happens to be exclusively in the cloud -- which we find interesting and unique for us. As for the Hadoop piece, we don’t use Autonomy yet, but there are some interesting use cases that we are exploring there. We use Vertica in a couple of places in our architecture, not only that central data warehouse, but we also use it as a high-performance storage vehicle for our analytic data marts.

So when our customers are pushing a lot of information through our system, we're tapping into Vertica’s horsepower in two spots. Then, our analytic engine can ingest and deal with those massive amounts of data as we start to present it to customers.
Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition
On the Haven architecture side, we're a wonderful example of where Haven ends up in the cloud. For the applications themselves, the kind of things that customers are creating, might be these hybrid styles where they're drawing legacy information in from their existing on-premise systems. Then, they're gathering up, as I said before, their sales and marketing information and their social information.

The one that we see as a wonderful green field for us is capturing social information. We have our own social analytic maturity model that we describe to customers and partners on how to capitalize on your campaigns and how to maximize your exposure through every single social channel you can think of.

We're very proficient at that, and that's what's really driving the immense sizes of data that our customers are asking for right now. Where we used to talk in tens of terabytes for a big system, we're now talking in the world of hundreds, multiple hundreds of terabytes, for a system. Case by case by case, we're seeing this really take off.

Gardner: Do you have any companies, either named or unnamed, that provide a great use case example of BI as a service?
Where we used to talk in tens of terabytes for a big system, we're now talking in the world of hundreds, multiple hundreds of terabytes, for a system.

Morris: One of our oldest and most dear customers is Zendesk. They have a very successful customer-support application in the cloud. They provide both a freemium model and degrees of for-fee products to their customers.

And the number one reason why their customers upgrade from freemium to general and then general to the gold level of product is the analytics that they're supplying inside of there. They very recently announced a whole series of data products themselves, all powered by GoodData, as the embedded analytic environment within Zendesk.

We have another customer, Service Channel which is a wonderful example of marrying together two very disparate user communities. Service Channel is a facility’s management enterprise resource planning (ERP) application. They bring together the facility managers of your favorite brick-and-mortar retailers with the suppliers who provide those retail facilities service, janitorial services, air-conditioning guy, the plumbers.

Disparate customers

Marrying disparate types of customers, they create their own data products as well, where they are integrating third-party information like weather data. They score their customers, both the retailers as well as the suppliers, and benchmark them against each other. They compare how well one vendor provides service to another vendor and they also compare how much one of the retailers spends on maintaining their space.

Of course, Apple gets incredibly high marks. RadioShack, right now, as they transition their stores, not so much. Service Channel knew this information long before the industry did, because they're watching spend. They, too, are starting to create almost a bidding network.

When they integrated their weather data into the environment, they started tracking and saying, "Apple would like to gain first right of refusal on the services that they need." So if Apple’s air conditioning goes out, the service provider comes in and fixes the air-conditioning sooner than Best Buy and all of their competitors. And they'll bid up for that. So they've created almost a marketplace. As I said before, these data products are really quite an advantage for us.

Gardner: What's coming next?

Morris: We're seeing a number of great opportunities, and many are created and developed by the technologies we've chosen as our platform. We love the idea of creating not only predictive, but prescriptive, types of applications in use cases on top of the GoodData environment. We have customers that are doing that right now and we expect to see them continue to do that.

What I think will become really interesting is when the GoodData community starts to share their analytic experiences or their analytic product with each other. We feel like we're creating a central location where analysts, data scientists, and our regular IT can all come together and build a variety of analytic applications, because the data lives in the same place. The data lives in one central location, and that’s an unusual thing. In most of the industry your data is still siloed. Either you keep it to yourself on-premise or your vendors keep it to themselves in the cloud and on-premise.

But we become this melting pot of information and of data that can be analytically evaluated and processed. We love the fact that Vertica has its own built-in analytic functions right in the database itself. We love the fact that they run our predictive language without any other issue and we see our customers beginning to build off of that capability.
Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition
My last point about the power of that central location and the power of GoodData is that our whole goal is to free time for those data scientists and those IT people to actually perform analytics and get out of the business of maintaining the systems that make analytics available, so that you can focus on the real intellectual capital that you want to be creating.
Identifying trends

Gardner: So, Chris, to cap this off, I think we've identified some trends. We have PaaS for BI. We have hybrid BI. We have cloud data joins and ecosystems that create a higher value abstraction from data. Any thoughts about how this comes together, and does this fit into the vision that you have at HP Vertica and that you're seeing in other parts of your business?

Selland: We're very much only at the front end of the big data analytics revolution. I ultimately don’t think we are going to be using the term "big data" in 10 years.

I often compare big data today to eBusiness 10, 12 years ago. Nobody uses that term anymore, but that was when everything was going online, and now everything is online, and the whole world has changed. The same thing is happening with analytics today.

With a hundred times more data we can actually get 10,000 times more insight. And that's true, but it's not just the amount of data; it's the ability to cross-correlate. That's the whole vision of what Jeff was just talking about that GoodData is trying to do.
We're very much only at the front end of the big data/analytics revolution. I ultimately don’t think we are going to be using the term "big data" in 10 years.

It's the vision of Haven, to bring in all types of data and to be able to look at it more holistically. One of my favorite examples, just to make that concrete, is that there is an airline we were talking to. They were having a customer service issue. They were having a lot of their passengers tweeting angrily about them, and they were trying to analyze the social media data to figure out how to make this stop and how to respond.

In a totally separate part of the organization, they had a predictive maintenance project, almost an Internet-of-things (IoT) type of project, going on. They were looking at data coming off the fleet, and trying to do better job of keeping their flights on time.

If you think about this, you say, "Duh." There was a correlation between the fact that they were having service problems and that the flights were late with the fact that the passengers were angry. Suddenly, they realized that maybe by focusing less on the social data in this case, or looking at that as the symptom as opposed to cause, they were able to solve the problem much more effectively. That's a very, very simple example.

I cite that because it makes real for people that it's when you really start cross-correlating data you wouldn't normally think belong together -- social data and maintenance data, for example -- you get true insights. It's almost a silly simple example, but those types of examples we're going to see much more. The more of this we can do, the more power we are going to get. I think that the front end of the revolution is here.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Read a full transcript or download a copy. Sponsor: HP Enterprise.

You may also be interested in:

Tuesday, August 11, 2015

HP pursues big data opportunity with updated products, services, developer program

HP today at its Big Data Conference in Boston unveiled a series of new products, services, and programs designed to help organizations better leverage data and analytics.

The company announced:
  • A new release of HP Vertica, called Excavator, that feature data streaming and advanced log file text search to power high-speed analytics on Internet of Things (IoT) data.
  • Broader support for and contributions to open source technologies, including optimized Hadoop performance, integration with the Apache Kafka distributed messaging system, and advancements in Distributed R predictive analytics.
  • The HP Haven Startup Accelerator program, which provides early-stage companies with fast, affordable access to both HP Big Data and Application Delivery Management software and services.
"Big data helps us make more sense of it all, a byproduct of what we do everyday," said Robert Youngjohns, Executive Vice President and General Manager at HP Software, in a keynote address. "Big data solves everyday problems like security, inventory, and empowers workers ... and you can now exploit 100 percent of your data."

Youngjohns said that HP at its HP Protect show in a few weeks will announce how to better mine IT systems analytics to make enterprises more secure.

"Big data changes the game in the idea economy," said Colin Mahony, Senior Vice President and General Manager of Big Data at HP. "Big data is core to all apps, but customized composite analytic applications are coming."

The key question for enterprises is, how can you embed analytics into the role you play in your organization?, said Mahony, adding that, enterprises need to spin up new apps constantly to identify and analyze via context, which they cannot get from packaged apps.

"Developers are the new heroes of the idea economy," said Mahony. "Through our Haven and Haven OnDemand platforms, we are empowering these heroes to transform their business through data, by allowing them to harness the value of all forms of information, rapidly connect and apply open source, and quickly access the tools they need to build winning businesses." [Disclosure: HP Enterprise is a sponsor of BriefingsDirect podcasts.]

Also addressing the keynote audience was recent Turing Award winner Mike Stonebraker, CTO and co-founder of Tamr. He said that the development of the column store database was the most disruptive thing I ever did. "It transformed the market," he said, and lead to the Vertica big data platform that HP acquired in 2011.

But he cautioned against getting caught up in marketing buzz over data science substance. "I’ve been doing big data forever. The buzzword is meaningless, it’s about solving data volume, velocity, and variety problems."

"Analytics is moving to complex analytics, using machine learning and statistics, so you need to get smart at data science," said Stonebraker.

'Excavator'

Capabilities in the new version of Vertica, codenamed "Excavator," include:
  • Data-streaming analytics offering native support for the Apache Kafka open-source distributed messaging system to enable organizations to quickly ingest and analyze high-speed streaming data, including IoT, in near real time. This capability delivers actionable insight for a wide range of use cases, including manufacturing process control, supply-chain optimization, healthcare monitoring, financial risk management, and fraud detection.
  • Advanced machine log text search to enable organizations to collect and index large log file data sets generated by systems and business applications, helping IT organizations quickly identify and predict application failures and cyber-attacks, and investigate authorized and unauthorized access.
HP also released a series of solutions designed to enable organizations to combine the innovation of open source with the enterprise-scale, reliability, and security. These include:
"Developers are the new heroes of the Idea Economy."
  • HP Vertica for SQL on Hadoop native file support, bringing significant increase in performance on popular Hadoop formats like ORC and Parquet. Specifically, HP worked collaboratively with Hortonworks to develop a new high-performance access layer that enables SQL queries to run directly on ORC files, resulting in a 5x increase in execution times.
  • HP Vertica Flex Zone Table Library with which HP has open sourced its innovative Flex Table "schema on-need" technology to the global developer community. With this move, organizations will be able to fully harness virtually any form of semi-structured data to meet their unique needs.
  • HP announced its commitment to integrate Vertica with Apache Spark. This will enable accelerated data transfer between Vertica and Spark, allowing organizations to take full advantage of their Spark-based deployments. This future capability will enable the developer community to build their models in Spark and run them in Vertica for high-speed and sophisticated analytics.
Startup Accelerator

HP also unveiled The HP Haven Startup Accelerator, a new program designed to support and expand HP's ecosystem of developers and innovators by making HP Big Data and Application Delivery Management software products accessible to early-stage companies. The program removes traditional barriers for organizations looking to leverage analytics and data to build powerful, differentiated applications. Qualified participants will benefit from the following program components:
  • Free use of the community versions of HP IDOL and HP Vertica with expanded capacity.
  • Premium version of HP IDOL and HP Vertica with attractive pricing
  • HP Application Delivery Management tools including HP Agile Manager, HP LeanFT, and HP LoadRunner.
HP also announced an innovative framework of technology and proven best practices to accelerate the development of next-generation analytical applications. This framework extends the HP Haven Big Data Platform with quick-start visualization, syndicated data feeds, open on-premise and cloud-based APIs. This enables HP Professional Services and HP partners to quickly deliver a range of solutions, such as voice of customer, smart cities, and fraud detection.
"HP is uniquely positioned to help businesses and developers thrive in this new world."

Optimized platforms for big data include traditional HP ProLiant DL380 clusters, purpose-built Apollo 4510, 4530 and 4200 compute and storage servers, and HP's innovative asymmetric Big Data Reference Architecture. These systems enable customers to optimize their big data workloads, delivering the power of Haven to their business.

Availability Planned availability for new HP Haven Big Data offerings and services is set for fall of 2015.

You may also be interested in:

Monday, August 10, 2015

How eCommerce sites harvest big data across multiple clouds

The next BriefingsDirect big data innovation thought leadership interview highlights how a consultant helps large ecommerce organizations better manage their big data architectures across cloud environments.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Read a full transcript or download a copy.

To learn more about how big data is best architected for the largest web applications, BriefingsDirect sat down with Jimmy Mohsin, Principal Software Architect at Norjimm LLC, a consultancy based in Princeton, New Jersey. The discussion is moderated by me, Dana Gardner, Principal Analyst at Interarbor Solutions.

Here are some excerpts:

Gardner: How are large web applications deciding on the right big data architecture? 

Mohsin: There's a lot of interest in trying to deal with large data volumes, not only large data volumes, but also data that changes rapidly. Now, there are many companies that have very large datasets, some in terabytes, some in petabytes and then they're getting live feeds.

The data is there and it’s changing rapidly. The traditional databases sometimes can’t handle that problem, especially if you're using that database as a warehouse and you're reporting against it.

Basically, we have kind of a moving-target situation. With HP Vertica, what we've seen is the ability to solve that problem in at least some of the cases that I've come across, and I can talk about specific use cases in that regard.

Input/output issues

Gardner: Before we get into a specific use case, I'm interested particularly in some of these input/output issues. People are trying to decide how to move the data around. They're toying with cloud. They're trying to bring data for more types of traditional repositories. And, as you say, they're facing new types of data problems with streaming and real-time feeds.

How do you see them beginning this process when they have to handle so many variables? Is it something that’s an IT architecture, or enterprise architecture, or data architecture? Who's responsible for this, given that it’s now a rather holistic problem?
Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition
Mohsin: In my present project, we ran into that. The problem is that many companies don't even have a well defined data-architecture team. Some of them do. You'll find a lot of companies with an enterprise-architect role and you'll have some companies with a haphazard definition of an architectural group.
Mohsin
Net-net, at least at this point, unless companies are more structured, it becomes a management issue in the sense that someone at the leadership level needs to know who has what domain knowledge and then form the appropriate team to skin this cat.

I know of a recent situation where we had to build a team of four people, and only one was an architect. But we built a virtual team of four people who were able to assemble and collate all the repositories that spanned 15 years and four different technology flavors, and then come up with an approach that resulted in a single repository in HP Vertica.

So there are no easy answers yet, because organizations just aren't uniformly structured.

Gardner: Well, I imagine they'll be adapting, just like we all are, to the new realities. In the meantime, tell me about a specific use case that demonstrates the intensity of scale and velocity, and how at least one architecture has been deployed to manage that?

Mohsin: One of my present projects deals with one of the world's largest retailers. It's eCommerce, online selling. One of the things they do, in addition to their transactions of buying and selling, is email campaign management. That means staying in touch with the customer on the basis of their purchases, their interests, and their profiles.

One of the things we do is see what a certain customer’s buying preferences have been over the past 90 days. Knowing that and the customer’s profile, we can try to predict what their buying patterns will be. So we send them a very tailored message in that regard. In this project, we're dealing with about 150 to 160 million emails a day. So this is definitely big data.

Here we have online information coming into one warehouse as to what's happening in the world of buying and selling. Then, behind the scenes, while that information is being sent to the warehouse, we're trying to do these email campaigns.

This is where the problem becomes fairly complicated. We tried traditional relational database management systems (RDBMS), and they kind of worked, but we ran into a slew of speed and performance issues. That's really where the big-data world was really beneficial. We were able to address that problem in about a seven-month project that we ran.

Gardner: And this was using HP Vertica?

Large organization

Mohsin: We did an evaluation. We looked at a few databases, and the corporate choice was Vertica. We saw that there is a whole bunch of big-data vendors. The issue is that many of the vendors don't have any large organizations behind them, and Vertica does. The company management felt that this was a new big database, but HP was behind it, and the fact that they also use HP hardware helped a lot.

They chose Vertica. The team I was managing did a proof of concept (POC) and we were able to demonstrate that Vertica would be able to handle the reporting that is tied to the email campaign management. We ran a 90 day POC, and the results were so positive that there was an interest in going live. We went live in about another 90 days, following a 90-day POC.

Gardner: I understand that Vertica is quite versatile. I've heard of a number of ways in which it's used technically. But this email campaign problem almost sounds like a transactional issue, a complex event processing issue, or a transfer agent scaling issue. How does big data, Vertica, and analytics come to bear on this particular problem?

Mohsin: It's exactly what you say it is. As we are reporting and pushing out the campaigns, new information is coming in every half hour, sometimes even more frequently. There's a live feed that's updating the warehouse. While the warehouse is being updated, we want to report against it in real time and keep our campaigns going.
Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition
The key point is that we can't really stop any of these processes. The customers who are managing the campaigns want to see information very frequently. We can’t even predict when they would want their information. At the same time, the transactional systems are sending us live feeds.

The problem we ran into with the traditional RDBMS is that the reporting didn't function when the live feeds were underway. We couldn't run our back-end email campaign reports when new data was coming in.

One of the benefits Vertica has, due to its basic architecture and its columnar design is that it's better positioned to do that. This is what we were able to demonstrate in the live POC, and nobody was going to take our word for it.

The end user said, "Take few of our largest clients. Take some of our clients that have a lot of transactions. Prove that the reports will work for those clients." That's what we did in 30 days. Then, we extended it, and then in 90 days, we demonstrated the whole thing end to end. Following that was the go-live.

Gardner: You had to solve that problem of the live feeds, the rapidity of information. Rather going to a stop, batch process, analyze, repeat, you've gained a solution to your problem.

But at the same time, it seems like you're getting data into an environment where you can analyze it and perhaps extract other forms of analysis, in addition to solving your email, eCommerce trajectory issues. It seems to me that you're now going to have the opportunity to add a new dimension of analysis to what's going on and perhaps we find these transactions more toward a customer inference benefit.

More than a database

Mohsin: One of the things internally that I like to say is that Vertica isn't just a big database, it’s more than just a database. It's really a platform, because you have distributed all, you are publishing other tools. When we adopted it and went live with this technology, we first solved the feeds and speeds problem, but now we're very much positioned to use some of the capabilities that exist in Vertica.

We had Distributed R being one of them, Inference Analysis being another one, so that we can build intelligent reports. To date, we've been building those outside the RDBMS. RDBMS has no role in that. With Vertica, I call it more of a data platform. So we definitely will go there, but that would be our second phase.

As the system starts to function and deliver on the key use cases, the next stage would be to build more sophisticated reports. We definitely have the requirements and now we have the ability to deliver.

Gardner: Perhaps you could add visualization capabilities to that. You could make a data pool available to more of the constituents within this organization so that they could innovate and do experiments. That’s a very powerful stuff indeed.

Is there anything else you can tell us for other organizations that might be facing similar issues around real-time feeds and the need to analyze and react, now that you have been through this on this particular project. Are there any lessons learned for others.
One of the issues in big data at least today is that you can’t find a whole slew of clients who have already gone live and who are in production.

If you're facing transactional issues and you haven't thought about a big-data platform as part of that solution, what do you offer to them in terms of maybe lighting a light bulb in their mind about looking for alternatives to traditional middleware.

Mohsin: Like so many people try to do, we tried to see if anyone else had done this. One of the issues in big data at least today is that you can’t find a whole slew of clients who have already gone live and who are in production.

There are lots of people in development, and some are live, but in our space, we couldn't find anyone who was live. We solved that issue via a quick-hit POC. The big lesson there was that we scoped the POC right. We didn’t want to do too much and we didn’t want to do too little. So that was a good lesson learned.

The other big thing is the data-migration question. Maybe, to some extent, this problem will never be solved. It's not so easy to pull data out of legacy database systems. Very few of them will give you good tools to migrate away from them. They all want you to stay. So we had to write our own tooling. We scoured the market for it, but we couldn’t find too many options out there.

Understand your data

So a huge lesson learned was, if you really want to do this, if you want to move to big data, get a handle on understanding your data. Make sure you have the domain experts in-house. Make sure you have the tooling in place, however rudimentary it might be, to be able to pull the data out of your existing database. Once you have it in the file system, Vertica can take it in minutes. That’s not the problem. The problem is getting it out.
Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition
We continue to grapple with that and we have made product enhancement recommendations. But in fairness to Vertica, this is really not something that Vertica can do much about, because this is more in the legacy database space.

Gardner: I've heard quite a few people say that, given the velocity with which they are seeing people move to the cloud, that obviously isn't part of their problem, as the data is already in the cloud. It's in the standardized architecture that that cloud is built around, if there is a platform-as-a-service (PaaS) capability, then getting at the data isn't so much of a problem, or am I not reading that correctly?
There is still a lingering fear of the cloud. People will tell you that the cloud is not secure.

Mohsin: No, you're reading that correctly. The problem we have is that a lot of companies are still not in the cloud. There is still a lingering fear of the cloud. People will tell you that the cloud is not secure. If you have customer information, if you have personalized data, many organizations don't want to put it in the cloud.

Slowly, they are moving in that direction. If we were all there, I would completely agree with you, but since we still have so many on-premise deployments, we're still in a hybrid mode -- some is on-prem, some is in the cloud.

Gardner: I just bring it up because it gives yet another reason to seriously consider cloud. It’s a benefit that is actually quite powerful -- the data access and ability to do joins and bring datasets together because they're all in the same cloud.

Mohsin: I fundamentally agree with you. I fundamentally believe in the cloud and that it really should be the way to go. Going through our very recent go-live, there is no way we could have the same elasticity in an on-prem is deployment that we can have in a cloud. I can pick up the phone, call a cloud provider, and have another machine the next day. I can't do that if it’s on-premise.

Again, a simple question of moving all the assets into the cloud, at least in some organizations, will take several months, if not years.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Read a full transcript or download a copy. Sponsor: HP Enterprise.

You may also be interested in:

Wednesday, August 5, 2015

How Localytics uses big data to improve mobile app development and marketing

The next BriefingsDirect big data innovation case study interview investigates how Localytics uses data and associated analytics to help providers of mobile applications improve their applications -- and also allow them to better understand the uses for their apps and dynamic customer demands.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Read a full transcript or download a copy.
To learn more about how big data helps mobile application developers better their products and services, please join Andrew Rollins, Founder and Chief Software Architect at Localytics, based in Boston. The discussion is moderated by me, Dana Gardner, Principal Analyst at Interarbor Solutions.

Here are some excerpts:

Gardner: Tell us about your organization. You founded it to do what?

Rollins: We founded in 2008, two other guys and I. We set out initially to make mobile apps. If you remember back in 2008, this is when the iPhone App Store launched. So there was a lot of excitement around mobile apps at that time.

Rollins
We initially started looking at different concepts for apps, but then, over a period of a couple months, discovered that there really weren't a whole lot of services out there for mobile apps. It was basically a very bare ecosystem, kind of like the Wild, Wild West. [Register for the upcoming HP Big Data Conference in Boston on Aug. 10-13.]

We ended up focusing on whether there was a services play in this industry and we settled on analytics, which we then called Localytics. The analogy we like to use is, at the time it was a little bit of a gold rush, and we want to sell the pickaxes. So that’s what we did.

Gardner: That makes a great deal of sense, and it has certainly turned into a gold rush. For those folks who do the mining, creating applications, what is it that they need to know?

Analytics and marketing

Rollins: That’s a good question. Here's a little back story on what we do. We do analytics, but we also do marketing. We're a full-service solution, where you can measure how your application is performing out in the wild. You can see what your users are doing. You can do anything from funnel analysis to engagement analysis, things like that.

From there, we also transition into the marketing side of things, where you can manage your push notifications, your in/out messaging.

For people who are making mobile apps, often they want to look at key metrics and then how to drive those metrics. That means a lot of A/B testing, funnel analysis, and engagement analysis.
Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition
It means not only analyzing these things, but making meaningful interactions, reaching out to customers via push notifications, getting them back in the app when they are not using the app, identifying points of drop-off, and messaging them at the right time to get them back in.
An example would be an e-commerce app. You've abandoned the shopping cart. Let’s get you back in the application via some sort of messaging. Doing all of that, measuring the return on investment (ROI) on that, measuring your acquisition channels, measuring what your users are doing, and creating that feedback loop is what we advocate mobile app developers do.

Gardner: You're able to do data-driven marketing in a way that may not have been very accessible before, because everything that’s done with the app is digital and measurable. There are logs, servers -- and so somewhere there's going to be a trail. It’s not so much marketing as it is science. We've always thought of marketing as perhaps an art and less of a science. How do you see this changing the very nature of marketing?

Everything ultimately that you are doing really does need to be data-driven. It's very hard to work off just intuition alone.
Rollins: Everything ultimately that you are doing really does need to be data-driven. It's very hard to work off of just intuition alone. So that's the art and science. You come out with your initial hypothesis, and that’s a little bit more on the craft or art side, where you're using your intuition to guide you on where to start.

From there, you have to use the data to iterate. I'm going to try this, this, and this, and then see which works out. That would be like a typical multivariate kind of testing.

Determine what works out of all these concepts that you're trying, and then you iterate on that. That's where measuring anything you do, any kind of interaction you have with your user, and then using that as feedback to then inform the next interaction is what you have to be doing.

Gardner: And this is also a bit revolutionary when it comes to software development. It wasn't that long ago that the waterfall approach to development might leave years between iterations. Now, we're thinking about constantly updating, iterating, getting a feedback loop, and condensing the latency of that feedback loop so that we really can react as close to real-time as possible.

What is it about mobile apps that's allowed for a whole different approach to this notion of connectedness and feedback loops to an app audience?

Mobile apps are different

Rollins: This brings up a good point. A lot of people ask why we have a mobile app analytics company. Why did we do that? Why is typical web analytics not good enough? It kind of speaks to something that you're talking about. Mobile apps are a little bit different than the regular web, in the sense that you do have a cycle that you can push apps out on.

You release to, let’s say, the iPhone App Store. It might take a couple of weeks before your app goes out there. So you have to be really careful about what you're publishing, because your turnaround time is not that of the web. [Register for the upcoming HP Big Data Conference in Boston on Aug. 10-13.]

However, there are certain interactions you can have, like on the messaging side, where you have an ability to instantly go back and forth. Mobile apps are a different kind of market. It requires a little different understanding than the traditional approach.

... We consume the data in a real-time pipeline. We're not doing background batch processing that you might see in something like Hadoop. We're doing a lot of real-time pipeline stuff, such that you can see results within a minute or two of it being uploaded from a device. That's largely where HP Vertica comes in, and why we ended up using Vertica, because of its real-time nature. It’s about the scale.
Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition
Gardner: If I understand correctly, you have access to the data from all these devices, you are crunching that, and you're offering reports and services back to your customers. Do they look to you as also a platform provider or just a data-service provider? How do the actual hosting and support services for these marketing capabilities come about?

Rollins: We tend to cater more toward the high end. A lot of our customers are large app publishers that have an ongoing application, let’s say a shopping application or news application.

In that sense, when we bring people on board, oftentimes they tend to be larger companies that aren’t necessarily technically savvy yet about mobile, because it's still new for some people. We do offer a lot of onboarding services to make sure they integrate their application correctly, measure it correctly, and are looking at the right metrics for their industry, as compared to other apps in that industry.

Then, we keep that relationship open as they go along and as they see data. We iterate on that with them. Because of the newness of the industry it does require education.

Gardner: And where is HP Vertica running for you? Do you run it on your own data center? Are you using cloud? Is there a hybrid? Do you have some other model?

Running in the cloud

Rollins: We run it in the cloud. We are running on Amazon Web Services (AWS). We've thought a lot about whether we should run it in a separate data center, so that we can dictate the hardware, but presently we are running it in AWS.

Gardner: Let’s talk about what you can do when you do this correctly. Because you have a capacity to handle scale, you've developed speed, and you understand the requirements in the market, what are your customers getting from the ability to do all this?

Rollins: It really depends on the customer. Something like an e-commerce app is going to look heavily at things like where users are dropping off and what's preventing them from making that purchase.

Another application, like news, which I mentioned, will look at something different, usually something more along the lines of engagement. How long are they reading an article for? That matters to them, so that they can give those numbers to advertisers.

So the answer to that largely depends on who you are and what your app is. Something like an e-commerce app is going to look heavily at things like where users are dropping off and what's preventing them from making that purchase.
Something like an e-commerce app is going to look heavily at things like where users are dropping off and what's preventing them from making that purchase.

Gardner: I suppose another benefit of developing these insights, as specific and germane as they might be to each client, is the ability to draw different types of data in. Clearly, there's the data from the App Store and from the app itself, but if we could join that data with some other external datasets, we might be able to determine something more about why they drop-off or why they are spending more, or time doing certain things.

So is there an opportunity, and do you have any examples of where you've been able to go after more datasets and then be able to scale to that?

Rollins: This is something that's come up a lot recently. In the past year, we have our own products that we're launching in this space, but the idea of integrating different data types is really big right now.

You have all these different silos -- mobile, web, and even your internal server infrastructure. If you're a retail company that has a mobile app, you might even have physical stores. So you're trying to get all this data in some collective view of your customer.
Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition
You want to know that Sally came to your store and purchased a particular kind of item. Then, you want to be able to know that in your mobile app. Maybe you have a loyalty card that you can tie across the media and then use that to engage with her meaningfully about stuff that might interest her in the mobile app as well.
"We noticed that you bought this a month ago. Maybe you need another one. Here is a coupon for it."

Other datasets

That's a big thing, and we're looking at a lot of different ways of doing that by bringing in other datasets that might not be from just a mobile app itself.

We're not even focused on mobile apps any more. We're really just an app analytics company, and that means the web and desktop. We ship in Windows, for example. We deal with a lot of Microsoft applications. Tying together all of that stuff is kind of the future. [Register for the upcoming HP Big Data Conference in Boston on Aug. 10-13.]

Gardner: For those organizations that are embarking on more of a data-driven business model, that are looking for analytics and platforms and requirements, is there anything that you could offer in hindsight having traveled this path and worked with HP Vertica. What should they keep in mind when they're looking to move into a capability, maybe it's on-prem, maybe it's cloud. What advice could you offer them?

At scale, you have to know what each technology is good at, and how you bring together multiple technologies to accomplish what you want.
Rollins: The journey that we went through was with various platforms. At the end of day, be aware of what the vendor of the big-data platform is pitching, versus the reality of it.

A lot of times, prototyping is very easy, but actually going to large scale is fairly difficult. At scale, you have to know what each technology is good at, and how you bring together multiple technologies to accomplish what you want.

That means a lot of prototyping, a lot of stress testing and benchmarking. You really don’t know until you try it with a lot of these things. There are a lot of promises, but the reality might be different.

Gardner: Any thoughts about Vertica’s track record, given your length of experience?

Rollins: They're really good. I'm both impressed with the speed of it as compared to other things we have looked at, as well as the features that they release. Vertica 7 has a bunch of great stuff in it. Vertica 6, when it came out, had a bunch of great stuff in it. I'm pretty happy with it.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Read a full transcript or download a copy. Sponsor: HP.

You may also be interested in:

Tuesday, August 4, 2015

HP hyper-converged appliance delivers speedy VDI and apps deployment and a direct onramp to hybrid cloud

HP today announced the new HP ConvergedSystem 250-HC StoreVirtual (CS 250), a hyper-converged infrastructure appliance (HCIA) based on HP's new ProLiant Apollo 2000 server and HP StoreVirtual software-defined storage (SDS) technology.

Built on up-to-date HP, Intel, and VMware technologies, the CS 250 combines a virtual server and storage infrastructure that HP says is configurable in minutes for nearly half the price of competitive systems. It is designed for virtual desktops and remote office productivity, as well as  to provide a flexible path to hybrid cloud. [Disclosure: HP is a sponsor of BriefingsDirect.]

Designed to attract customers on a tight budget, the HP CS 250 includes a new three-node configuration that is up to 49 percent more cost effective than comparable configurations from Nutanix, SimpliVity and other competitors, says HP. Because HP's StoreVirtual runs in VMware, Microsoft Hyper-V and KVM virtual environments, the appliance may soon come to support all those hypervisors.

HP recently discontinued the EVO:RAIL version of its HCIA, which was based on the EVO:RAIL software from OEM partner VMware.

Increasingly, even small IT shops want to modernize and simplify how they support existing applications. They want virtualization benefits to extend to storage, backup and recovery, and be ready to implement and consume some cloud services. They want the benefits of software-defined data centers (SDDC), but they don’t want to invest huge amounts of time, money, and risk in a horizontal, pan-IT modernization approach.

That's why, according to IDC, businesses are looking for flexible infrastructure solutions that will allow them to quickly deploy and run new applications. This trend has resulted in a 116 percent year-over-year increase in hyper-converged systems sales and 60 percent compound annual growth rate (CAGR) anticipated through 2019.

The growth in the building blocks approach to IT infrastructure is building rapidly. IDC estimates that in 2015, $10.2 billion will be spent on converged systems, representing 11.4 percent of total IT infrastructure spending. This number will grow to $14.3 billion by 2018, representing 14.9 percent of total IT infrastructure spending, says IDC. Similarly, Technology Business Research, Inc. in Hampton, NH, estimates a $10.6 billion U.S. addressable market over the next 12 months, through mid-2016.

With HCIAs specifically, enterprises can begin making what amounts to mini-clouds based on their required workloads and use cases.  IT can quickly deliver the benefits of modern IT architectures without biting off the whole cloud model. Virtual desktops is a great place to begin, especially as Windows 10 is emerging on the scene.

Indeed, VDI deployments that support as many as 250 desktops on a single appliance at a remote office or agency, for example, allow for ease in administration and deployment on a small footprint while keeping costs clear and predictable. And, if the enterprise wants to scale up and out to hybrid cloud, they can do so with ease and low risk.

Multi-site continuity

The inclusion of three 4TB StoreVirtual Virtual Storage Appliance (VSA) licenses also allows the new HP CS 250 system to replicate data to any other HP StoreVirtual-based solution. This means that customers can leverage their existing infrastructure as a replication target at no additional cost, says HP. The CS 250 also allows customers to tailor the system with a choice of up to 96 processing cores, a mix of SSD and SAS disk drives, and up to 2TB of memory per 4-node appliance -- double that of previous generations.

The CS 250 arrives pre-configured for VMware's vSphere 5.5 or 6.0 and HP OneView InstantOn to enable customers to be production-ready with only 5 minutes of keyboard time and a total of 15 minutes deployment time, with daily management from VMware vCenter via the HP OneView for VMware vCenter plug-in, says HP.

HP sees the CS 250 as a oath to bigger things. For midsize and enterprise customers seeking an efficient and cost-effective cloud entry point, for example, the new HP Helion CloudSystem 9.0 built on the CS 250 provides a direct path to the hybrid cloud. This hyper-converged cloud solution leverages the clustered compute and storage resources of the CS 250 for on-premise workloads but adds self-service portal provisioning and public cloud bursting features for those moving beyond server virtualization.
HP announced that it is enhancing its “Nitro” partner program and opening it up to distributors worldwide, starting with Arrow Electronics in the US.

HP is also introducing new Software-Defined Storage Design and Integration services to help customers deploy highly scalable, elastic cloud storage services, the company announced today. The integration service provides customers with detailed configuration and implementation guidance tailored to their specific needs to accelerate time to value, said HP.

The 4-node CS 250-HC StoreVirtual is available on August 17, while 3-node configurations are available on September 28.  A sample solution price inclusive of the 3-node CS250 with Foundation Carepack and VMware vSphere Enterprise starts at a list price of $121,483, said HP.

You may also be interested in: