Monday, November 4, 2013

Different paths to cloud and SaaS enablement yield similar major benefits for Press Ganey and Planview

Listen to the podcast. Find it on iTunes. Read a full transcript or download a copy. Sponsor: VMware.

The next VMworld innovator panel discussion focuses on how two companies are using aggressive cloud-computing strategies to deliver applications better to their end users.

We'll hear how healthcare patient-experience improvement provider Press Ganey and project and portfolio management provider Planview are both exploiting cloud efficiencies and agility. Their paths to the efficiency of cloud have been different, but the outcomes speak volumes for how cloud transforms businesses.

To understand how, we sat down with Greg Ericson, Senior Vice President and Chief Innovation Officer at Press Ganey Associates in South Bend, Indiana, and Patrick Tickle, Executive Vice President of Products at Planview Inc. in Austin, Texas.

The discussion, which took place at the recent 2013 VMworld Conference in San Francisco, is moderated by Dana Gardner, Principal Analyst at Interarbor Solutions. [Disclosure: VMware is a sponsor of BriefingsDirect podcasts.]

Here are some excerpts:
Gardner: We heard a lot about cloud computing at VMworld, and you're both going at it a little differently. Greg, tell us a bit about the type of cloud approach you’re taking at Press Ganey.

Ericson: Press Ganey is the leader in a patient-experience analytics. We focus on providing deep insight into the patient experience in healthcare settings. We have more than 10,000 customers within the healthcare environment that look to us and partner with us around patient-experience improvement within the healthcare setting.

Ericson
We started this cloud  journey in July of 2012 and we set out to achieve multiple goals. Number one, we wanted to position Press Ganey's software as solution products of the next generation and have a platform that was able to support them. 

We went through a journey of consolidating multiple data centers. We consolidated 14 different storage arrays in our process and, most importantly, we were able to position our analytic solutions to be able to take on exponentially more data and provide that to our clients.

Gardner: Patrick, how has cloud helped you at Planview? You were, at one time, a fully a non-cloud organization. Tell us about your journey.

Tickle: Planview has been an enterprise software vendor, a classic best-of-breed focused enterprise software vendor, in this project and portfolio and resource management space for over 20 years.

Tickle
We have a big global customer base of on-premise customers that built up over the last 23 years. Obviously, in the world of software these days, there's a fairly seismic big shift about being in software as a service (SaaS) and how you get to the cloud, the business models, and all those kinds of things.

Conventional wisdom is for a lot it was that you can't get there unless you start from scratch. Obviously, because this is the only thing we do, it was pretty imperative that we figure out a way to get there.

So two or three years ago, we started trying to make the transition. There were a lot of things we had to go through, not just from an infrastructure standpoint, but from a business model and delivery standpoint, etc.

The essence was here. We didn’t have time to rewrite a code base in which we've invested 10-plus years and hundreds of thousands of hours of customer experience to be a market-leading product in our space. It could take five years to rewrite it. Compared to where we were 10 years ago, when you and I first met, there are a lot more tools in the bag for people to get to the cloud that there were then.

So we really went after VMware and did the research sweep much more aggressively. We started out with our own kind of infrastructure that we bolted together and moved to a FlexPod in our second generation.

We have vCloud Hybrid Services now, and leveraging our existing code base, and then the whole suite of VMware products and services, we have transformed the company into a cloud provider. Today, 90 percent of all our new Planview customers are SaaS customers. It's been a big transition for us, but the technology from VMware has been right in the center of making it happen.

Business challenges

Gardner: Greg, tell us a little bit about some of the business challenges that are driving your IT requirements that, in turn, make the cloud model attractive. Is this a growth issue? Is this a complexity issue? What are your business imperatives that make your IT requirements?

Ericson: That’s a great question. Press Ganey is a 25-year-old organization. We pioneered the concept of patient experience and the analytics, and insight into the patient experience, within the healthcare setting. We have an organization that's steeped in history, and so there are multiple things that we're looking at.

Number one, we have one of the largest protected health information (PHI) databases in the United States. So we felt that we had to have a very secure and robust solution to provide to our clients, because they trust us with their data.

Number two, with the healthcare reform, the focus on patient experience is somewhat mandatory, whereas before, it was somewhat voluntary. Now, it's regulated or it's part of the healthcare reform. When you look at organizations, some were actually coming to us and saying, "We want to get however many patient surveys out that we need to satisfy our threshold."
Our scientists are also finding a correlation between the patient experience results and clinical and quality outcomes.

Our philosophy is why would you want to do that? We believe that if you can understand and leverage the different media to be able to fill that out, you can survey your entire population of patients that are coming into not only your institution but, in the accountable care organization, the entire ecosystem that you’re serving. That gives you tremendous insight into what's going on with those patients.

Our scientists are also finding a correlation between the patient experience results and clinical and quality outcomes. So, as we can tie those data sets together in those episodic events, we're finding very interesting kinds of new thought, leading thought, out there for our clients to look at.

So for us, going from minimally surveying your population to doing census survey, which is your entire population, represents an exponential growth. The last thing is that, for our future, in terms of going after some of those new analytics, some of the new insight that we want to provide our clients, we want to position the technology to be able to take us there.

We believe that the VMware vCloud Suite represents a completeness of vision. It represents a complete a single pane of glass into managing the enterprise and, longer-term, as we become more sophisticated in identifying our data and as the industry matures, we think that a public cloud, a hybrid cloud, is in the future for us, and we're preparing for that.

Gardner: And this must be a challenge for you, not only in terms of supporting the applications, but also those data sets. You're getting some larger data sets and they could be distributed. So the cloud model suits your data needs over time as well?

Deeper insights

Ericson: Absolutely. It gives us the opportunity to be able to apply technology in the most cost-value proposition for the solutions that we’re serving up for our customers.

Our current environment is around 600 server instances. We have about 300 terabytes (TB) running in 20 SaaS applications, and we're growing exponentially each month, as we continue to provide that deeper insight for our customers.

Gardner: Patrick, for your organization what are some of the business drivers that then translate into IT requirements?

Tickle: From an IT perspective, it changed the culture of the company, moving from being a on-premise perpetual kind of "ship the software and have a customer care organization that focuses on bug and break-fix" to a service-delivery model. There were a lot of things that rippled through that whole thing.
We had to move from an IT culture to an OPs culture and all the things that go along with that, performance and up time.

At the end of the day, we had to move from an IT culture to an operations culture and all the things that go along with that, performance and up-time. Our customer base is global. So it was being able to provide that around the globe is. All those things were pretty significant shifts from an IT perspective.

We went from a company that had a corporate IT group to a company that has a hosting and DevOps and Ops team that has a little bit of spend in corporate IT.

Out of the gate, the first step at Planview was moving to colo. SunGard has been a great partner for us over the last couple of years as our ping, power, and pipe. Then, in our first generation, we bolted together some of our storage and computer infrastructure because it wasn’t quite all the way there. Then, in our most recent incarnation of the infrastructure we’re using FlexPods at SunGard in Austin, Texas and London.
OPEX spend

We're always having to evaluate future footprints. But ultimately, like many companies, we would like to convert that infrastructure investment from a capital spend into an OPEX spend. And that’s what’s compelling with vCloud Hybrid Service.

What we've been excited about hearing from VMware is not just providing the performance and the scalability, but the compatibility and the economic model that says we’re building this for people who want to just move virtual machines (VMs). We understand how big the opportunity is, and that’s going to open up more of a public cloud opportunity for us to evaluate for a wide variety of use cases going forward.

Gardner: How big a deal is it when we can, with just a click of a mouse, move workloads to any support environment we want?

Tickle: It's a huge deal. Whether it’s a production environment or disaster recovery (DR) environment, at the end of the day it's a big deal for both of us. For a SaaS company the only matter is renewals. It’s happy customers that renew. That transition from perpetual-plus maintenance to a renewal model, where you're on the customer service watch at another level, and it's every minute of every day.

Everything that we can do to make the customer experience, not just from our UI and our software, but obviously the delivery of the service, as compelling as possible, allows us to run our business. That can be a disaster scenario or just great performance across our geography where we have customers and then to do that in a cost effective way that operates inside our business model, our profit and loss.

So our shareholders are equally pleased with their turn off. We can't afford to have half of the company’s OPEX go into IT, while we’re trying to make customers as successful as they possibly can. We continue to be encouraged that we’re on a great path with the stack that we're seeing to get there.

Gardner: I think it's fair to say that cloud is not just repaving old cow paths, that cloud is really transforming your entire business. Do you agree, Greg?

Rejuvenate legacy

Ericson: I agree. It allows us, especially an organization that’s 25 years steeped in history, to be able to rejuvenate our legacy applications and be able to deliver those with maximum speed, maximizing our resources, and delivering them in a secure environment. But it also allows us to be able to grow, to flex, and to be able to rejuvenate and organically transform the organization. It's pretty exciting for us and it adds a lot of value to our clients indirectly.

Gardner: Greg,what are some of the more measurable pay-offs when you go to cloud? Are these soft payoffs of productivity and automation or are there hard numbers about return on investment (ROI) or moving more to a operation cost versus capital cost? What do you get when you do cloud right?

Ericson: We justify the investment based on consolidation of our data centers, consolidation and retirement of our storage arrays, and so on. That’s from a hard-savings perspective. From a soft-savings perspective, clearly in an environment that was not virtualized, virtualizing the environment represented a significant cost avoidance.
Our focus is on a complete solution that allows us to really focus in on what's important for us, what's important for our clients.

Longer-term, we're looking at how to position the organization with a robust, virtual secured infrastructure that runs with a minimum amount of technical resources, so that we can focus most of our efforts on delivering innovative applications to our clients.

The biggest opportunity for us is to focus there. As you look at the size of the data set and the growth of those data sets, positioning infrastructure to be able to stay with you is exciting for us and it’s a value proposition for our clients.

Entire environment

With a minimum amount of staff, we were able to move in nine months and virtualize our entire environment. When you talk about 600 servers and 300 TB of data, that's a pretty sizable enterprise and we're fully leveraging the vCloud Suite.

Our network is virtualized, our storage is virtualized, and our servers are virtualized. The release of vCloud Suite 5.5 and some of the additional network functionality and storage functionality that’s coming out with that is rather exciting. I think it's going to continue to add more value to our proposition.

Gardner: Some people say that a single point of management, when you have that comprehensive suite approach, comes in pretty handy, too.

Ericson: It does, because it gives you the capability of managing through a single pane of glass across your environments. I was going to accentuate that we’re about 50 percent complete in building on our catalog.

For our next steps, number one is that we’re looking at building upon the excellence of Press Ganey and building our next-generation enterprise data warehouse. We’re looking at leveraging from a DevOps perspective the VMware vCloud Suite, and we already have some pilots that are up and running. We'll continue to build that out.
Not only are we maximizing our assets in delivering a secure environment for our clients, but we're also really working toward what I call engineering to zero.

As we deploy, not only are we maximizing our assets in delivering a secure environment for our clients, but we're also really working toward what I call engineering to zero. We’re completely automating and virtualizing those deployments and we're able to move those deployments, as we go from dev to test, and test to user acceptance testing, and then into a production environment.

Tickle: As we all know, there are lot of hypervisors out there. We can all get that technology from a wide variety of sources. But to your question about the value with the stack, that’s what's we look at and again. What's important now is not just the product stack, but the services stack.

We look at a company like VMware and say, "Site Recovery Manager in conjunction with vCloud Hybrid Services brings a DR solution to me as SaaS vendor and that fits with my architecture and brings that service stack plus."

There's no comparing another hypervisor vendor to build out that stack of service. Again, we could probably talk about probably numerous, but that’s when I listen to the things that go on at the event and get to spend time with the people at VMware. That whole value stack that VMware is investing in is what looks so much more compelling than just picking pieces of technology.

Gardner: Looking to the future, Greg, based on what you've heard at VMworld about the general availability of vCloud Hybrid Services and the upgrade to the suite of private cloud support, what has you most excited? Was there something that surprised you? What is in the future road map for you?

A step further

Ericson: A couple of different things. The next release of NSX is exciting for us. It allows us to be able to take the virtualization of our network a step further. Also to be able to connect hypervisors into a hybrid-cloud situation is something that, as we evolve our maturity in terms of managing our data, is going to be exciting for us.
One of the areas that we're still teasing out and want to explore is how to tie in that accelerator for a big-data application into that. Probably, in 2014, what we're looking at is how to take this environment and really move from a DR kind of environment to a high-availability environment. I believe that we’re architected for that and because of the virtualization we can do that with a minimum amount of investment.
Listen to the podcast. Find it on iTunes. Read a full transcript or download a copy. Sponsor: VMware.

You may also be interested in:

Wednesday, October 30, 2013

Learn how Visible Measures tracks an expanding universe of video and viewer use big data

Listen to the podcast. Find it on iTunes. Read a full transcript or download a copy. Sponsor: HP.

The next edition of the HP Discover Performance Podcast Series examines how video advertising solutions provider Visible Measures delivers impactful metrics on video use and patterns.

Visible Measures measures via a massive analytics capability an ocean of video at some of the highest scales I've ever heard of. By creating very deep census data of everything that's happened in the video space, Visible Measures uses unique statistical processes to figure out exactly what patterns emerge within video usage at high speed and massive scale and granularity.
 
To learn more about how Visible Measures measures, please welcome Chris Meisl, Chief Technology Officer at Visible Measures Corp., based in Boston.

The discussion, which took place at the recent HP Vertica Big Data Conference in Boston, is moderated by me, Dana Gardner, Principal Analyst at Interarbor Solutions. [Disclosure: HP is a sponsor of BriefingsDirect podcasts.]

Here are some excerpts:
Gardner: Tell us a little bit about video metrics. It seems that this is pretty straightforward, isn't it? You just measure the number of downloads and you know how many people are watching a video -- or is there more to it?

Meisl: You'd think it would be that straight-forward. Video is probably the fastest growing component of the Internet right now. Video consumption is accelerating unbelievably. When you measure a video, not only you are looking at did someone view the video but how far they are into the video. Did they rewind it, stop it, or replay certain parts? What happened at the end? Did they share it?

Meisl
There are all kinds of events that can happen around a video. It's not like in the display advertising business, where you have an impression and you have a click. With video, you have all kinds of interactions that happen.

You can really measure engagement in terms of how much people have actually watched the video, and how they've interacted with a video while it's playing.

Gardner: This is an additional level of insight beyond what happened traditionally with television, where you need a Nielsen box or some other crude, if I could use that term, way of measuring. This is much more granular and precise.

Census based

Meisl: Exactly. The cable industry tried to do this on various occasions with various set-up boxes that would "phone home" with various information. But for the most part, like Nielsen, it's panel-based. On the Internet, you can be more census-based. You can measure every single video, which we do. So we now know about over half a billion videos and we've measured over three trillion video events.

Because you have this very deep census data of everything that's happened, you can use standard and interesting statistical processes to figure out exactly what's happening in that space, without having to extend a relatively small panel. You know what everyone is doing.

Gardner: And of course, this extends not only to programming or entertainment level of video, but also to the advertising videos that would be embedded or precede or follow from those. Right?

Meisl: Exactly. Advertising and video are interesting, because it's not just standard television-style advertising. In standard television advertising, there are 30-second spots that are translated into the Internet space as pre-roll, post-roll, mid-roll, or what have you. You're watching the content that you really want to watch, and then you get interrupted by these ads. This is something that we at Visible Measures didn't like very much.

We're promoting this idea of content marketing through video, and content marketing is a very well-established area. We're trying to encourage brands to use those kinds of techniques using the video medium.
The first part that you have to do is have a really comprehensive understanding of what's going on in the video space.

That means that brands will tell more extensive stories in maybe three- to five-minute video segments -- that might be episodic -- and we then deliver that across thousands of publishers, measure the engagement, measure the brand-lift, and measure how well those kinds of video-storytelling features really help the brand to build up the trust that they want with their customers in order to get the premium pricing that that brand has over something much more generic.

Gardner: Of course, the key word there was "measures." In order to measure, you have to capture, store, and analyze. Tell us a little bit about the challenges that you faced in doing that at this scale with this level of requirements. It sounds as if even the real-time elements of being able to feed back that information to the ad servers is important, too.

Meisl: Right. The first part that you have to do is have a really comprehensive understanding of what's going on in the video space.

Visible Measure started with measuring all video that’s out there. Everywhere we can, we work with publishers to instrument their video players so that we get signals while people are watching videos on their site.

For the publishers that don't want to allow us to instrument their players, then we can use more traditional Google spidering techniques to capture information on the view count, comment count, and things like that. We do that on a regular basis, a few times a day or at least once a day, and then we can build up metrics on how the video is growing on those sites.

Massive database

So we ended up building this massive database of video -- and we would provide information, or rather insight, based on that data, to advertisers on how well their campaigns were performing.

Eventually, advertisers started to ask us to just deliver the campaign itself, instead of giving just the insight that they would then have to try to convince various other ad platforms to use in order to get a more effective campaign. So we started to shift a couple of years ago into actual campaign delivery.

Now, we have to do more of a real-time analysis, because as you mentioned, you want to, in real time, figure out the best ways to target the best sites to send that video to, and the best way to tune that campaign in order to get the best performance for the brand.

Gardner: And so faced with these requirements, I assume you did some proofs of concept (POCs). You looked around the marketplace for what’s available and you’ve come up with some infrastructure that is so far meeting your needs.

Meisl: Yes. We started with Hadoop, because we had to build this massive database of video, and we would then aggregate the information in Hadoop and pour that into MySQL.
There are all kinds of possibilities that you can only do if you have access to the data as soon as it was generated.

We quickly got to the point where it would take us so long to load all that information into MySQL that we were just running out of hours in the day. It took us 11 hours to load MySQL. We couldn’t actually use the MySQL. It was a sharded MySQL cluster. We couldn’t actually use it while it was being loaded. So you’d have to have two banks of it.

You only have a 12-hour window. Otherwise, you’ve blown your day. That's when we started looking around for alternate solutions for storing this information and making it available to our customers. We elected to use HP Vertica -- this was about four years ago -- because that same 11-hour load took two hours in Vertica. And we're not going to run out of money buying hard drives, because they compress it. They have impressive compression.

Now, as we move more into the campaign delivery for the brands that we represent, we have to do our measurement in real-time. We use Storm, which is a real-time stream processing platform and that writes to Vertica as the events happen.

So we can ask questions of Vertica as they happen. That allows our ad service, for example, to have much more intelligence about what's going on with campaigns that are in-flight. It allows us to do much more sophisticated fraud detection. There are all kinds of possibilities that you can only do if you have access to the data as soon as it was generated.

Gardner: Clearly if a load takes 11 hours, you're well into the definition of big data. But I'm curious, for you, what constitutes big data? Where does big data begin from medium or non-big data?

Several dimensions

Meisl: There are several dimensions to big data. Obviously, there's the size of it. We process what we receive, maybe half a billion events per day, and we might peak at near a million events a minute. There is quite a bit of lunchtime video viewing in America, but typically in the evening, there is a lot more.

The other aspect of big data is the nature of what's in that data, the unstructured nature, the complexity of it, the unexpectedness of the data. You don't know exactly what you're going to get ahead of time.

For information that’s coming from our instrumented players, we know what that’s going to be, because we wrote the code to make that. But we receive feeds from all kinds of social networks. We know about every video that's ever mentioned on Twitter, videos that are mentioned on Facebook, and other social arenas.

All of that's coming in via all kinds of different formats. It would be very expensive for us to have to fully understand those formats, build schemas for them, and structure it just right.

So we have an open-ended system that goes into Hadoop and can process that in an open-ended way. So to me, big data is really its volume plus the very open-ended, unknown payloads in that data.
We're continuously looking at how well we optimize delivery of campaigns and we're continuously improving that.

Gardner: How do you know you're succeeding here? Clearly, going from 11 hours to two hours is one metric. Are there other metrics of success that you look to -- they could be economic, performance, or concurrent query volumes?

Tell me what you define as a successful analytics platform.

Meisl: At the highest level, it's going to be about revenue and margin. But in order to achieve the revenue and margin goals that we have, obviously we need to have very efficient processes for doing the campaign delivery and the measurement that we do.

As a measurement company, we measure ourselves and watch how long it takes to generate the reports that we need, or for how responsive we are to our customers for any kind of ad-hoc queries that they want or special custom reports that they want.

We're continuously looking at how well we optimize delivery of campaigns and we're continuously improving that. We have corporate goals to improve our optimization quarter-over-quarter.

In order to do that, you have to keep coming up with new things to measure and new ways to interpret the data, so you can figure out exactly which video you want to deliver to the right person, at the right time, in the right context.

Looking down the road

Gardner: Chris, we're here at the Big Data Conference for HP Vertica and its community. Looking down the road a bit, what sort of requirements do you think you are going to need later? Are there milestones or is there a road map that you would like to see Vertica and HP follow in order to make sure that you don't run out of runaway again sometime?

Meisl: Obviously, we want HP and Vertica to continue to scale up, so that it is still a cost-effective solution as the volume of data will inexorably rise. It's just going to get bigger and bigger and bigger. There's no going back there.

In order to be able to do the kind of processing that we need to do without having to spend a fortune on server farms, we want Vertica, in particular, to be very efficient at the kinds of queries that it needs to do and proficient at loading the data and of accommodating asking questions of it.
In order to be able to do the kind of processing that we need to do without having to spend a fortune on server farms, we would want Vertica.

In addition to that, what's particularly interesting about Vertica is its analytic functions. It has a very interesting suite of analytic functions that extends beyond the normal standard SQL analytic functions based on time series and pattern matching. This is very important to us, because we do fraud detection, for example. So you want to do pattern matching on that. We do pacing for campaigns, so you want to do time series analysis for that.

We look forward to HP and Vertica really pushing forward on new analytic capabilities that can be applied to real-time data as it flows into the Vertica platform.
Listen to the podcast. Find it on iTunes. Read a full transcript or download a copy. Sponsor: HP.

You may also be interested in:

Tuesday, October 22, 2013

Complex carrier network performance data on HP Vertica yields performance and customer metrics boon for Empirix

Listen to the podcast. Find it on iTunes. Read a full transcript or download a copy. Sponsor: HP.

The next edition of the HP Discover Performance Podcast Series explores how network testing, monitoring, and analytics provider Empirix developed unique and powerful data processing capabilities.

Empirix uses an advanced analytics engine to continuously and proactively evaluate carrier network performance and customer experience metrics -- amid massive data flows -- to automatically identify issues as they emerge.

To learn more about how a combination of large-scale, real-time performance and pervasive data access made the HP Vertica analytics platform stand out to support such demands for Empirix, join Navdeep Alam, Director of Engineering, Analytics and Prediction at Empirix, based in Billerica, Mass.

The discussion, which took place at the recent HP Vertica Big Data Conference in Boston, is moderated by Dana Gardner, Principal Analyst at Interarbor Solutions. [Disclosure: HP is a sponsor of BriefingsDirect podcasts.]

Here are some excerpts:
Gardner: Why do you have such demanding requirements for data processing and analysis?

Alam: What we do is actively and passively monitor networks. When you're in a network as a service provider, you have the opportunity to see the packets within that network, both on the control plane and on the user plane. That just means you're looking at signaling data and also user plane data -- what's going on with the behavior; what's going at the data layer. That’s a vast amount of data, especially with mobile, and most people doing stuff on their devices with data.

Alam
When you're in that network and you're tapping that data, there is a tremendous amount of data -- and there's a tremendous amount of insights about not only what's going on in the network, but what's going on with the subscribers and users of that network.

Empirix is able to collect this data from our probes in the network, as well as being able to look at other data points that might help augment the analysis. Through our analytics platform we're able to analyze that data, correlate it, mediate it, and drive metrics out of that data.

That’s a service for our customers, increasing value from that data, so that they can turn around a return on investment (ROI) and understand how they can leverage their networks better to increase operations and so forth. They can understand their customers better and begin to analyze, slice and dice, and visualize data of this complex network.

They can use our platform, as well to do proactive and predictive analysis, so that we can create even better ROI for our customers by telling them what potentially might go wrong and what might be the solution to get around that to avoid a catastrophe.

New opportunities

Gardner: It’s interesting that not only is this data being used for understanding the performance on the network itself, but it's giving people business development and marketing information about how people are using it and where the new opportunities might be.

Is that something fairly new? Were you able to do that with data before, or is it the scale and ability to get in there and create analysis in near-real-time that’s allowed for such a broad-based multilevel approach to data and analysis?

Alam: This is something we've gotten into. We definitely tried to do it before with success, but we knew that in order to really tackle mobile and the increasing demands of data, we really had to up the ante.

Our investment with HP Vertica and how we've introduced that in our new analytics platform, Empirix IntelliSight 1.0, that recently came out, is about leveraging that platform -- not only for scalability and our ability to ingest and process data, but to look at data in its more natural format, both as discrete data, and also as aggregate data. We allow our customers to view that data ad hoc and analyze that data.

It positioned us very well. Now that we have a central point from which all this data is being processed and analyzed, we now run analytics directly at this data, increasing our data locality and decreasing the data latency. This definitely ups our ante to do things much faster, in near real time.
We're right where the data is being generated, where it’s flowing, and because of that we're able to gain access to the data in real-time.

Gardner: Obviously, the sensors, probes, agents, and the ability to pull in the information from the network needs to reside or be at close proximity to the network, but how are you actually deployed? Where does the infrastructure for doing the data analysis reside? Is it in the networks themselves, or is there a remote site? Maybe you could just lay out the architecture of how this is set up.

Alam: We get installed on site. Obviously, the future could change, but right now we're an on-premise solution. We're right where the data is being generated, where it’s flowing, and because of that we're able to gain access to the data in real-time.

One of the things we learned is that this is a tremendous amount of data. It doesn't make sense for us to just hold it and assume that we will do something interesting with it afterward.

The way we've approached our customers is to say, "What kind of value do you seen in this data? What kind of metrics or key performance indicators (KPIs), or what do you think is valuable in this data? We then build a framework that defines the value that they can gain from data -- what are the metrics and what kind of structure they want to apply to this data. We're not just calculating metrics, but we're also applying some sort of model that gives this data some structure.

As they go through what we call the Empirix Intelligent Data Mediation and Correlation (IDMC) system, it's really an analytics calculator. It's putting our data into the Vertica system, so that at that point we have meaningful, actionable data that can be used to trigger alarms, to showcase thresholds, to give customers great insight to what's going on in their network.

Growing the business

From that, they can do various things, such as solve problems proactively, reach out to the customers to deal with those issues, or to make better investments with their technology in order to grow their business.

Gardner: How long have you been using Vertica and how did that come to be the choice that you made? Perhaps you could also tell us a little bit about where you see things going in terms of other capabilities that you might need or a roadmap for you?

Alam: We've been using Vertica for a few years, at least three or four, even before I came on-board. And we're using Vertica primarily for its ability to input and read data very quickly. We knew that, given our solutions, we needed to load a lot of data into the system and then read a lot of data out of it fast and to do it at the same time.

At that time, the database systems we used just couldn't meet the demands for the ever-growing data. So we leveraged Vertica there, and it was used more as an operational data store. When I came on board about a year-and-a-half ago, we wanted to evolve our use of Vertica to be not just for data warehousing, but a hybrid, because we knew that in supporting a lot of different types of data, it was very hard for us to structure all of those types of data.

We wanted to create a framework from which we can define measures and metrics and KPIs and store it in a more flat system from which we can apply various models to make sense of that data.
Ultimately, we wanted to allow customers to play with this data at will and to get response in seconds, not hours or minutes.

That really presented us a lot of challenges, not only in scalability, but our ability to work and play with data in various ways. Ultimately, we wanted to allow customers to play with this data at will and to get response in seconds, not hours or minutes.

It required us to look at how we could leverage Vertica as an intelligent data-storage system from which we could process data, store it, and then get answers out of that data very, very quickly. Again, we were looking for responses in a second or so.

Now that we've put all of our data in the data basket, so to speak, with Vertica, we wanted to take it to the next level. We have all this data, both looking at the whole data value chain from discrete data to aggregate data all in one place, with conforming dimensions, where the one truth of that data exists in one system.

We want to take it to the next step. Can we increase our analytical capabilities with the data? Can we find that signal from the noise now that we have all this data? Can we proactively find the patterns in the data, what's contributing to that problem, surface that to our customers, and reduce the noise that they are presented with.?

Solving problems

Instead of showing them that 50 things are wrong, can I show them that 50 things are wrong, but that these one or two issues are actually impacting your network or your subscribers the most? Can we proactively tell them what might be the cause or the reason toward that and how to solve it?

The faster we can load this data, the faster we can retrieve the value out of this data and find that needle in the haystack. That’s where the future resides for us.

Gardner: Clearly, you're creating value and selling insight to the network to your customers, but I know other organizations have also looked at data as a source of revenue in itself. The analysis could be something that you could market. Is there an opportunity with the insight you have in various networks -- maybe in some aggregate fashion -- to create analysis of behavior, network use, or patterns that would then become a revenue source for you, something that people would subscribe to perhaps?

Alam: That's a possibility. Right now, our business has been all about empowering our customers and giving them the ability to leverage that data for their end use. You can imagine, as a service provider, having great insight into their customers and the over-the-top applications that are being leveraged on their network.

Could they then use our analytics and the metadata that we're generating about their network to empower their business systems and their operations to make smarter decisions? Can they change their marketing strategy or even their APIs about how they service customers on their network to take advantage of the data that we are providing them?
The opportunity to grow other business opportunities from this data is tremendous, and it's going to be exciting to see what our customers end up doing with their data.

The opportunity to grow other business opportunities from this data is tremendous, and it's going to be exciting to see what our customers end up doing with their data.

Gardner: Are there any metrics of success that are particularly important for you. You've mentioned, of course, scale and volume, but things like concurrency, the ability to do queries from different places by different people at the same time is important. Help me understand what some of the other important elements of a good, strong data-analysis platform would be for you?

Alam: Concurrency is definitely important. For us it's about predictability or linear scalability. We know that when we do reach those types of scenarios to support, let’s say, 10 concurrent users or a 100 concurrent users, or to support a greater segmentation of data, because we have gone from 10 terabytes to 30 terabytes, we don't have to change a line of code. We don't have to change how or what we are doing with our data. Linear scalability, especially on commodity hardware, gives us the ability to take our solution and expand it at will, in order to deal with any type of bottlenecks.

Obviously, over time, we'll tune it so that we get better performance out of the hardware or virtual hardware that we use. But we know that when we do hit these bottlenecks, and we will, there is a way around that and it doesn't require us to recompile or rebuild something. We just have to add more nodes, whether it’s virtual or hardware.
Listen to the podcast. Find it on iTunes. Read a full transcript or download a copy. Sponsor: HP.

You may also be interested in:

Open FAIR certification launched

This guest post comes courtesy of Jim Hietala, The Open Group Chief of Security.

By Jim Hietala

The Open Group today announced the new Open FAIR Certification Program aimed at risk analysts, bringing a much-needed professional certification to the market that is focused on the practice of risk analysis. Both the Risk Taxonomy and Risk Analysis standards, standards of The Open Group, constitute the body of knowledge for the certification program, and they advance the risk analysis profession by defining a standard taxonomy for risk, and by describing the process aspects of a rigorous risk analysis.

Hietala
We believe that this new risk analyst certification program will bring significant value to risk analysts, and to organizations seeking to hire qualified risk analysts. Adoption of these two risk standards from The Open Group will help produce more effective and useful risk analysis. This program clearly represents the growing need in our industry for professionals who understand risk analysis fundamentals.  Furthermore, the mature processes and due diligence The Open Group applies to our standards and certification programs will help make organizations comfortable with the ground breaking concepts and methods underlying FAIR. This will also help professionals looking to differentiate themselves by demonstrating the ability to take a “business perspective” on risk.

In order to become certified, risk analysts must pass an Open FAIR certification exam. All certification exams are administered through Prometric, Inc. Exam candidates can start the registration process by visiting Prometric’s Open Group Test Sponsor Site www.prometric.com/opengroup.  With 4,000 testing centers in its IT channel, Prometric brings Open FAIR Certification to security professionals worldwide. For more details on the exam requirements visit http://www.opengroup.org/certifications/exams.

Available November 1

Training courses will be delivered through an Open Group accredited channel. The accreditation of Open FAIR training courses will be available from November 1, 2013.

Our thanks to all of the members of the risk certification working group who worked tirelessly over the past 15 months to bring this certification program, along with a new risk analysis standard and a revised risk taxonomy standard, to the market. Our thanks also to the sponsors of the program, whose support is important to building this program. The Open FAIR program sponsors are Architecting the Enterprise, CXOWARE, SNA, and The Unit.
Thanks to all of the members of the risk certification working group who worked tirelessly over the past 15 months to bring this certification program to the market.

Lastly, if you are involved in risk analysis, we encourage you to consider becoming Open FAIR certified, and to get involved in the risk analysis program at The Open Group. We have plans to develop an advanced level of Open FAIR certification, and we also see a great deal of best practices guidance that is needed by the industry.

For more information on the Open FAIR certification program visit http://www.opengroup.org/certifications/openfair

You may also wish to attend a webcast scheduled for 7th November, 4pm BST that will provide an overview of the Open FAIR certification program, as well as an overview of the two risk standards. You can register here.

This guest post comes courtesy of Jim Hietala, The Open Group Chief of Security. 

You may also be interested in:


Thursday, October 17, 2013

Democratic National Committee leverages big data to turn politics into political science

Listen to the podcast. Find it on iTunes. Read a full transcript or download a copy. Sponsor: HP.

The next edition of the HP Discover Performance Podcast Series focuses on the big-data problem in the realm of politics. We'll learn how the Democratic National Committee (DNC) leveraged big data analytics to better understand and predict voter behavior and alliances in the 2012 U.S. national elections.

To learn more about how the DNC pulled vast amounts of data together to predict and understand voter preferences and positions on the issues, join Chris Wegrzyn, Director of Data Architecture at the DNC, based in Washington, DC.

The discussion, which took place at the recent HP Vertica Big Data Conference in Boston, is moderated by Dana Gardner, Principal Analyst at Interarbor Solutions. [Disclosure: HP is a sponsor of BriefingsDirect podcasts.] 

 Here are some excerpts:
Gardner: Like a lot of organizations, you had different silos of data and information, and you weren't able to do the analysis properly because of the distributed nature of the data and information. What did you do that allowed you to bring all that data together, and then also get the data assembled to bring out better analysis?

Wegrzyn: In 2008, we received a lot of recognition at that time for being a data-driven campaign and making some great leaps in how we improved efficiency by understanding our organization.

Wegrzyn
Coming out of that, those of us on the inside were saying this was great, but we have only really skimmed the surface of what we can do. We focused on some sets of data, but they're not connected to what people were doing on our website, what people were doing on social media, or what our donors were doing. There were all of these different things, and we weren’t looking at them.

Really, we couldn’t look at them. We didn't have the staff structure, but we also didn't have the technology platform. It’s hard to integrate data and do it in a way that is going to give people reasonable performance. That wasn't available to us in 2008.

So, fast forward to where we were preparing for 2012. We knew that we wanted to be able to look across the organization, rather than at individual isolated things, because we knew that we could be smarter. It's pretty obvious to anybody. It isn’t a competitive secret that, if somebody donates to the campaign, they're probably a good supporter. But unless you have those things brought together, you're not necessarily pushing that information out to people, so that they can understand.

We were looking for a way that we could bring data together quickly and put it directly into the hands of our analysts, and HP Vertica was exactly that kind of solution for us. The speed and the scalability meant that we didn't have to worry about making sure that everything was properly transformed and didn't have to spend all of this time structuring data for performance. We could bring it together and then let our analysts figure it out using SQL, which is very powerful, but pretty simple to learn.

Better analytic platform

Gardner: Until the fairly recent past, it wasn't practical, both from a cost and technology perspective, to try to get at all the data. But it has gotten to that point now. So when you are looking at all of the different data that you can bring to bear on a national election, in a big country of hundreds of millions of people, what were some of the issues you faced?

Wegrzyn: We hadn’t done it before. We had to figure it out as we were going along. The most important realization that we made was that it wasn't going to be a huge technology effort that was going to make this happen. It was going to be about analysts. That’s a really generic term. Maybe it's data scientists or something, but it's about people who were going to understand the political challenges, understand something about the data, and go in and find answers.

We structured our organization around being analyst-centric. We needed to build those tools and platforms, so that they could start working immediately and not wait on us on the technology side to build the best system. It wasn’t about building the best system, but it was about getting something where we could prototype rapidly.

Nothing that we did was worth doing if we couldn't get something into somebody's hands in a week and then start refining it. But we had to be able to move very, very quickly, because we were just under a constant time-crunch.
That gave us the mission and the freedom to go in and start thinking how we could change how this operates.

Gardner: I would imagine that in the final two months and weeks of an election, things are happening very rapidly. To have a better sense of what the true situation on the ground is gives you an opportunity to best react to it.

It seems that in the past, it was a gut instinct. People were very talented and were paid very good money to be able to try to distill this insight from a perspective of knowledge and experience. What changed when you were able to bring the HP Vertica platform, big data, and real-time analysis to the function of an election?

Wegrzyn: Just about everything. There isn't a part of the campaign that was untouched by us, and in a lot of those places where gut ruled, we were able to bring in some numbers. This came down from the top campaign manager, Jim Messina. Out of the gate, he was saying that we have to put analytics in every part of the organization and we want to measure everything. That gave us the mission and the freedom to go in and start thinking how we could change how this operates.

But the campaign was driven. We tested emails relentlessly. A lot of our program was driven by trying to figure out what works and then quantify that and go out and do more. One of our big successes is the most traditional of the areas of campaigns nowadays, media buying.

More valuable

There have been a bunch of articles that have come up recently talking about what the campaign did. So I'm not giving anything away. We were able to take what we understood about the electorate and who we wanted to communicate with. Rather than taking the traditional TV buying approach, which was we're going to buy this broad demographic band, buy a lot of TV news, and we are going to buy a lot of the stuff that's expensive and has high ratings amongst the big demographics. That’s a lot of wasted money.

We were able to know more precisely who the people are that we want to target, which was the biggest insight. Then, we were able to take that and figure out -- not the super creepy "we know exactly what you are watching" level -- but at an aggregate level, what the people we want to target are watching. So we could buy that, rather than buying the traditional stuff. That's like an arbitrage opportunity. It’s cheaper for us, but it's way more valuable.

So we were able to buy the right stuff, because we had this insight into what our electorate was like, and I think it made a big difference in how we bought TV.

Gardner: The results of your big data activities are apparent. As I recall, Governor Romney's campaign, at one point, had a larger budget for media, and spent a lot of that. You had a more effective budget with media, and it showed.

Another indication was that on election night, right up until the exit polls were announced, the Republican side didn't seem to know very clearly or accurately what the outcome was going to be. You seemed to have a better sense. So the stakes here are extremely high. What’s going to be the next chapter for the coming elections, in two, and then four years along the cycle?
How do we empower them to use the tools that we used and the innovations that we created to improve their activity? It’s going to be a challenge.

Wegrzyn: That’s a really interesting question, and obviously it's one that I have had to spend a lot of time thinking about. The way that I think about the campaign in 2012 was one giant fancy office tower. We call it the Obama Campaign. When you have problems or decisions that have to be made, that goes up to the top and then back down. It’s all a very controlled process.

We are tipping that tower on its side now for 2014. Instead of having one big organization, we have to try to do this to 50, 100, maybe hundreds of smaller organizations that are going to have conflicting priorities. But the one thing that they have in common now is they saw what we did on the last campaign and they know that that's the future.

So what we have to do is take that and figure out how we can take this thing that worked very well for this one big organization, one centralized organization, and spread it out to all of these other organizations so that we can empower them.

They're going to have smaller staffs. They're going to have different programs. How do we empower them to use the tools that we used and the innovations that we created to improve their activity? It’s going to be a challenge.

Gardner: It’s interesting, there are parallels between what you're facing as a political organization, with federation, local districts for Congress, races in the state level, and then of course to the national offices as well. This is a parallel to businesses. Many businesses have a large centralized organization and they also have distributed and federated business units, perhaps in other countries for global companies.

Feedback loop

Is there a feedback loop here, whereby one level of success, like you well demonstrated in 2012, leads to more of the federated, on-the-ground, distributed gathering and utilization of data that also then feeds back to the larger organization, so that there's a virtual adoption pattern that will benefit across the ecosystem? Is that something you are expecting?

Wegrzyn: Absolutely. Even within the campaign, once people knew that this tool was available, that they could go into HP Vertica and just answer any question about the campaign's operation, it transformed the way that people were thinking about it. It increased people's interest in applying that to new areas. They were constantly coming at us with questions like, "Hey, can we do this?" We didn't know. We didn’t have enough staff to do that yet.

One of our big advantages is that we've already had a lot of adoption throughout campaigns of some of the data gathering. They understand that we have to gather this data. We don't know what we are going to do with it, but we have them understanding that we have to gather it. It's really great, because now we can start doing smart things with it.

And then they're going to have that immediate reaction like, "Wow, I can go in there now and I can figure out something smart about all of the stuff that I put in and all of the stuff that I have been collecting. Now I want more." So I think we're expecting that it will grow. Sometimes I lose sleep about how that’s going to just grow and grow and grow.

Gardner: We think about that virtuous adoption cycle, more-and-more types of data, all the data, if possible, being brought to bear. We saw at the Big Data Conference some examples and use cases for the HAVEn approach for HP, which includes Vertica, Hadoop, Autonomy IDOL, Security, and ArcSight types of products and services. Does that strike a chord with you that you need to get at the data, but now that definition of the data is exploding and you need to somehow come to grips with that?
Our future is bringing all of those systems, all of those ideas together, and exposing them to that fleet of analysts and everybody who wants it.

Wegrzyn: That's something that we only started to dabble in, things like text analysis, like what Autonomy can with that unstructured data, stuff that we only started to touch on on the campaign, because it’s hard. We make some use of Hadoop in various parts of our setup.

We're looking to a future, where we bring in more of that unstructured intelligence, that information from social media, from how people are interacting with our staff, with the campaign in trying to do something intelligent with that. Our future is bringing all of those systems, all of those ideas together, and exposing them to that fleet of analysts and everybody who wants it.

Listen to the podcast. Find it on iTunes. Read a full transcript or download a copy. Sponsor: HP.

You may also be interested in: