Tuesday, April 14, 2015

GoodData analytics developers on what they look for in a big data platform

This BriefingsDirect big data innovation discussion examines how GoodData created a business intelligence (BI)-as-a-service capability across multiple industries that enables users to take advantage of both big-data performance as well as cloud delivery efficiencies. They had to choose the best data warehouse infrastructure to fit their scale and cloud requirements, and they ended up partnering with HP Vertica.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Read a full transcript or download a copy.

To learn more about their choice process for best big data in the cloud platform, we're joined by Tomas Jirotka, Product Manager of GoodData; Eamon O'Neill, the Director of Product Management at HP Vertica, and Karel Jakubec, Software Engineer at GoodData. The discussion is moderated by me, Dana Gardner, Principal Analyst at Interarbor Solutions.

Here are some excerpts:

Gardner: Tell us a bit about GoodData and why you've decided that the cloud model, data warehouses, and BI as a service are the right fit for this marketplace?

Jirotka: GoodData was founded eight years ago, and from the beginning, it's been developed as a cloud company. We provide software as a service (SaaS). We allow our customers to leverage their data and not worry about hardware/software installations. We just provide them a great service. Their experience is seamless, and our customers can simply enjoy the product.

Jirotka
We provide a platform -- and the platform is very flexible. So it's possible to have any type of data, and create insights. You can analyze data coming from marketing, sales, or manufacturing divisions -- no matter in which industry you are.
Gardner: If I'm an enterprise and I want to do BI, why should I use your services rather than build my own data center? What's the advantage?

Cheaper solution

Jirotka: First of all, our solution is cheaper. We have a multi-tenant environment. So the customers effectively share the resources we provide them. And, of course, we have experience and knowledge of the industry. This is very helpful when you're a beginner in BI.

Gardner: What have been some of the top requirements you’ve had as you've gone about creating your BI services in the cloud?

Jakubec
Jakubec: The priority was to be able to scale, as our customers are coming in with bigger and bigger datasets. That's the reason we need technologies like HP Vertica, which scales very well by just adding nodes to cluster. Without this ability, you realize you cannot implement solutions for the biggest customers. Even if you're running the biggest machines on the market, they're still not able to finish computation in a reasonable time.

Gardner: In addition to scale and cost, you need to also be very adept at a variety of different connection capabilities, APIs, different data sets, native data, and that sort of thing.

Jirotka: Exactly. Agility, in this sense, is really curial.

Gardner: How long you have been using Vertica and how long have you been using BI through Vertica for a variety of these platform services?

Working with Vertica

Gardner: What were some of the driving requirements for changing from where you were before?
Become a member of MyVertica
Register now
And gain access to the Free HP Vertica Community Edition.
Jirotka: We began moving some of our customers with the largest data marts to Vertica in 2013. The most important factor was performance. It's no secret that we also have Postgres in our platform. Postgres simply doesn’t support big data. So we chose Vertica to have a solution that is scalable up to terabytes of data.
Gardner: What else is creating excitement about Vertica?

O'Neill
O’Neill: Far and away, the most exciting is about real-time personalized analytics. This allows GoodData to show a new kind of BI in the cloud. A new feature we released in our 7.1 release is called Live Aggregate Projections. It's for telling you about what’s going on in your electric smart meter, that FitBit that you're wearing on your wrist, or even your cell-phone plan or personal finances.

A few years ago, Vertica was blazing fast, telling you what a million people are doing right now and looking for patterns in the data, but it wasn’t as fast in telling you about my data. So we've changed that.

With this new feature, Live Aggregate Projections, you can actually get blazing fast analytics on discrete data. That discrete data is data about one individual or one device. It could be that a cell phone company wants to do analytics on one particular cell phone tower or one meter.

That’s very new and is going to open up a whole new kind of dashboarding for GoodData in the cloud. People are going to now get the sub-second response to see changes in their power consumption, what was the longest phone call they made this week, the shortest phone call they made today, or how often do they go over their data roaming charges. They'll get real-time alerts about these kinds of things.
Become a member of MyVertica
Register now
And gain access to the Free HP Vertica Community Edition.
When that was introduced, it was standing room only. They were showing some great stats from power meters and then from houses in Europe. They were fed into Vertica and they showed queries that last year we were taking Vertica one-and-half seconds. We're now taking 0.2 seconds. They were looking at 25 million meters in the space for a few minutes. This is going to open up a whole new kind of dashboard for GoodData and new kinds of customers.

Gardner: Tomas, does this sound like something your customers are interested in, maybe retail? The Internet of Things is also becoming prominent, machine to machine, data interactions. How do you view what we've just heard Eamon describe, how interesting is it?

More important

Jirotka: It sounds really good. Real-time, or near real-time, analytics is becoming a more-and-more important topic. We hear it also from our customers. So we should definitely think about this feature or how to integrate it into the platform.

Jakubec: Once we introduce Vertica 7.1 to our platform, it will be definitely one of features we will focus on. We have developed a quite complex caching mechanism for intermediate results and it works like a charm for Postgres SQL, but unfortunately it doesn't perform so well for Vertica. We believe that features like Live Aggregate Projection will improve this performance.

Gardner: So it's interesting. As HP Vertica comes out with new features, that’s something that you can productize, take out to the market, and then find new needs that you could then take back to Vertica. Is there a feedback loop? Do you feel like this is a partnership where you're displaying your knowledge from the market that helps them technically create new requirements?

Jakubec: Definitely, it's a partnership and I would say a complex circle. A new feature is released, we provide feedback, and you have a direction to do another feature or improve the current one. It works very similarly with some of our customers.
Engineer-to-engineer exchanges happen pretty often in the conference rooms.

O’Neill: It happens at a deeper level too. Karel’s coworkers flew over from Brno last year, to our office in Cambridge, Massachusetts and hung out for a couple of days, exchanging design ideas. So we learned from them as well.

They had done some things around multi-tenancy where they were ahead of us and they were able to tell us how Vertica performed when they put extra schemers on a catalog. We learned from that and we could give them advice about it. Engineer-to-engineer exchanges happen pretty often in the conference rooms.

Gardner: Eamon, were there any other specific features that are popping out in terms of interest?

O’Neil: Definitely our SQL on Hadoop enhancements. For a couple of years now we've been enabling people to do BI on top of Hadoop. We had various connectors, but we have made it even faster and cheaper now. In this most recent 7.1 release, you can now install Vertica on your Hadoop cluster. So you no longer have to maintain dedicated hardware for Vertica and you don’t have to make copies of the data.

The message is that you can now analyze your data, where it is and as it is, without converting from the Hadoop format or a duplication. That’s going to save companies a lot of money. Now, what we've done is brought the most sophisticated SQL on Hadoop to people without duplication of data.

Using Hadoop

Jirotka: We employ Hadoop in our platform, too. There are some ETL scripts, but we've used it in a traditional form of MapReduce jobs for a long time. This is really costly and inefficient approach because it takes much time to develop and debug it. So we may think about using Vertica directly with Hadoop. This would dramatically decrease the time to deliver it to the customer and also the running time of the scripts.
Become a member of MyVertica
Register now
And gain access to the Free HP Vertica Community Edition.
Gardner: Eamon, any other issues that come to mind in terms of prominence among developers?
O’Neill: Last year, we had our Customer Advisory Board, where I got to ask them about those things. Security came to the forefront again and again. Our new release has new features around data-access control.

We now make it easy for them to say that, for example, Karel can access all the columns in a table, but I can only access a subset of them. Previously, the developers could do this with Vertica, but they had to maintain SQL views and they didn’t like that. Now it's done centrally.
They don’t want have to maintain security in 15 places. They'd like Vertica to help them pull that together.

They like the data-access control improvements, and they're saying to just keep it up. They want more encryption at rest, and they want more integration. They particularly stress that they want integration with the security policies in their other applications outside the database. They don’t want have to maintain security in 15 places. They'd like Vertica to help them pull that together.

Gardner: Any thoughts about security, governance and granularity of access control?

Jakubec: Any simplification of security and access controls is a great new. Restriction of access for some users to just subset of values or some columns is very common use case for many customers. We already have a mechanism to do it, but as Eamon said it involves maintenance of views or complex filtering. If it is supported by Vertica directly, it’s great. I didn’t know that before and I hope we can use it.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Read a full transcript or download a copy. Sponsor: HP.

No comments:

Post a Comment