Thursday, June 9, 2016

Alation centralizes data knowledge by employing machine learning and crowdsourcing

The next BriefingsDirect Voice of the Customer big-data case study discussion focuses on the Tower of Babel problem for disparate data, and explores how Alation manages multiple data types by employing machine learning and crowdsourcing.

We'll explore how Alation makes data more actionable via such innovative means as combining human experts and technology systems.

Listen to the podcast. Find it on iTunes. Get the mobile app. Read a full transcript or download a copy.

To learn more about how enterprises and small companies alike can access more data for better analytics, please join Stephanie McReynolds, Vice-President of Marketing at Alation in Redwood City, California. The discussion is moderated by me, Dana Gardner, Principal Analyst at Interarbor Solutions.

Here are some excerpts:

Gardner: I've heard of crowdsourcing for many things, and machine learning is more-and-more prominent with big-data activities, but I haven't necessarily seen them together. How did that come about? How do you, and why do you need to, employ both machine learning and experts in crowdsourcing?

McReynolds: Traditionally, we've looked at data as a technology problem. At least over the last 5-10 years, we’ve been pretty focused on new systems like Hadoop for storing and processing larger volumes of data at a lower cost than databases could traditionally support. But what we’ve overlooked in the focus on technology is the real challenge of how to help organizations use the data that they have to make decisions. If you look at what happens when organizations go to apply data, there's often a gap between the data we have available and what decision-makers are actually using to make their decisions.

McReynolds
There was a study that came out within the last couple of years that showed that about 56 percent of managers have data available to them, but they're not using it . So, there's a human gap there. Data is available, but managers aren't successfully applying data to business decisions, and that’s where real return on investment (ROI) always comes from. Storing the data, that’s just an insurance policy for future use.

The concept of crowdsourcing data, or tapping into experts around the data, gives us an opportunity to bring humans into the equation of establishing trust in data. Machine-learning techniques can be used to find patterns and clean the data. But to really trust data as a foundation for decision making human experts are needed to add business context and show how data can be used and applied to solving real business problems.

Gardner: Usually, when you're employing people like that, it can be expensive and doesn't scale very well. How do you manage the fit-for-purpose approach to crowdsourcing where you're doing a service for them in terms of getting the information that they need and you want to evaluate that sort of thing? How do you balance that?

Using human experts

McReynolds: The term "crowdsourcing" can be interpreted in many ways. The approach that we’ve taken at Alation is that machine learning actually provides a foundation for tapping into human experts.

We go out and look at all of the log data in an organization. In particular, what queries are being used to access data and databases or Hadoop file structures. That creates a foundation of knowledge so that the machine can learn to identify what data would be useful to catalog or to enrich with human experts in the organization. That's essentially a way to prioritize how to tap into the number of humans that you have available to help create context around that data.

That’s a great way to partner with machines, to use humans for what they're good for, which is establishing a lot of context and business perspective, and use machines for what they're good for, which is cataloging the raw bits and bytes and showing folks where to add value.
Embed the HPE
Big Data
OEM Software
Gardner: What are some of the business trends that are driving your customers to seek you out to accomplish this? What's happening in their environments that requires this unique approach of the best of machine and crowdsourcing and experts?

McReynolds: There are two broader industry trends that have converged and created a space for a company like Alation. The first is just the immense volume and variety of data that we have in our organizations. If it weren’t the case that we're adding additional data storage systems into our enterprises, there wouldn't be a good groundwork laid for Alation, but I think more interestingly perhaps is a second trend and that is around self-service business intelligence (BI).

So as we're increasing the number of systems that we're using to store and access data, we're also putting more weight on typical business users to find value in that data and trying to make that as self-service a process as possible. That’s created this perfect storm for a system like Alation which helps catalog all the data in the organization and make it more accessible for humans to interpret in accurate ways.
So as we're increasing the number of systems that we're using to store and access data, we're also putting more weight on typical business users to find value in that data and trying to make that as self-service a process as possible.

Gardner: And we often hear in the big data space the need to scale up to massive amounts, but it appears that Alation is able to scale down. You can apply these benefits to quite small companies. How does that work when you're able to help a very small organization with some typical use cases in that size organization?

McReynolds: Even smaller organizations, or younger organizations, are beginning to drive their business based on data. Take an organization like Square, which is a great brand name in the financial services industry, but it’s not a huge organization in and of itself, or Inflection or Invoice2go, which are also Alation customers.

We have many customers that have data analyst teams that maybe start with five people or 20 people. We also have customers like eBay that have closer to a thousand analysts on staff. What Alation provides to both of those very different sizes of organizations is a centralized place, where all of the information around their data is stored and made accessible.

Even if you're only collaborating with three to five analysts, you need that ability to share your queries, to communicate on which queries addressed which business problems, which tables from your HPE Vertica database were appropriate for that, and maybe what Hive tables on your Hadoop implementation you could easily join to those Vertica tables. That type of conversation is just as relevant in a 5-person analytics team as it is in a 1000-person analytics team.

Gardner: Stephanie, if I understand it correctly, you have a fairly horizontal capability that could apply to almost any company and almost any industry. Is that fair, or is there more specialization or customization that you apply to make it more valuable, given the type of company or type of industry?

Generalized technology

McReynolds: The technology itself is a generalized technology. Our founders come from backgrounds at Google and Apple, companies that have developed very generalized computing platforms to address big problems. So the way the technology is structured is general.

The organizations that are going to get the most value out of an Alation implementation are those that are data-driven organizations that have made a strategic investment to use analytics to make business decisions and incorporate that in the strategic vision for the company.

So even if we're working with very small organizations, they are organizations that make data and the analysis of data a priority. Today, it’s not every organization out there. Not every mom-and-pop shop is going to have an Alation instance in their IT organization.

Gardner: Fair enough. Given those organizations that are data-driven, have a real benefit to gain by doing this well, they also, as I understand it, want to get as much data involved as possible, regardless of its repository, its type, the silo, the platform, and so forth. What is it that you've had to do to be able to satisfy that need for disparity and variety across these data types? What was the challenge for being able to get to all the types of data that you can then apply your value to?
Embed the HPE
Big Data
OEM Software
McReynolds: At Alation, we see the variety of data as a huge asset, rather than a challenge. If you're going to segment the customers in your organization, every event and every interaction with those customers becomes relevant to understanding who that individual is and how you might be able to personalize offerings, marketing campaigns, or product development to those individuals.

That does put some burden on our organization, as a technology organization, to be able to connect to lots of different types of databases, file structures, and places where data sits in an organization.

So we focus on being able to crawl those source systems, whether they're places where data is stored or whether they're BI applications that use that data to execute queries. A third important data source for us that may be a bit hidden in some organizations is all the human information that’s created, the metadata that’s often stored in Wiki pages, business glossaries, or other documents that describe the data that’s being stored in various locations.

We actually crawl all of those sources and provide an easy way for individuals to use that information on data within their daily interactions. Typically, our customers are analysts who are writing SQL queries. All of that context about how to use the data is surfaced to them automatically by Alation within their query-writing interface so that they can save anywhere from 20 percent to 50 percent of the time it takes them to write a new query during their day-to-day jobs.

Gardner: How is your solution architected? Do you take advantage of cloud when appropriate? Are you mostly on-premises, using your own data centers, some combination, and where might that head to in the future?

Agnostic system

McReynolds: We're a young company. We were founded about three years ago and we designed the system to be agnostic as to where you want to run Alation. We have customers who are running Alation in concert with Redshift in the public cloud. We have customers that are financial services organizations that have a lot of personally identifiable information (PII) data and privacy and security concerns, and they are typically running an on-premise Alation instance.

We architected the system to be able to operate in different environments and have an ability to catalog data that is both in the cloud and on-premise at the same time.

The way that we do that from an architectural perspective is that we don’t replicate or store data within Alation systems. We use metadata to point to the location of that data. For any analyst who's going to run a query from our recommendations, that query is getting pushed down to the source systems to run on-premise or on the cloud, wherever that data is stored.

Gardner: And how did HPE Vertica come to play in that architecture? Did it play a role in the ability to be agnostic as you describe it?
It gives the IT department insight. Day-to-day, Alation is typically more of a business person’s tool for interacting with data.

McReynolds: We use HP Vertica in one portion of our product that allows us to provide essentially BI on the BI that’s happening. Vertica is used as a fundamental component of our reporting capability called Alation Forensics that is used by IT teams to find out how queries are actually being run on data source systems, which backend database tables are being hit most often, and what that says about the organization and those physical systems.

It gives the IT department insight. Day-to-day, Alation is typically more of a business person’s tool for interacting with data.

Gardner: We've heard from HPE that they expect a lot more of that IT department specific ops efficiency role and use case to grow. Do you have any sense of what some of the benefits have been from your IT organization to get that sort of analysis? What's the ROI?

McReynolds: The benefits of an approach like Alation include getting insight into the behaviors of individuals in the organization. What we’ve seen at some of our larger customers is that they may have dedicated themselves to a data-governance program where they want to document every database and every table in their system, hundreds of millions of data elements.
Embed the HPE
Big Data
OEM Software
Using the Alation system, they were able to identify within days the rank-order priority list of what they actually need to document, versus what they thought they had to document. The cost savings comes from taking a very data-driven realistic look at which projects are going to produce value to a majority of the business audience, and which projects maybe we could hold off on or spend our resources more wisely.

One team that we were working with found that about 80 percent of their tables hadn't been used by more than one person in the last two years. In that case, if only one or two people are using those systems, you don't really need to document those systems. That individual or those two individuals probably know what's there. Spend your time documenting the 10 percent of the system that everybody's using and that everyone is going to receive value from.

Where to go next

Gardner: Before we close out, any sense of where Alation could go next? Is there another use case or application for this combination of crowdsourcing and machine learning, tapping into all the disparate data that you can and information including the human and tribal knowledge? Where might you go next in terms of where this is applicable and useful?

McReynolds: If you look at what Alation is doing, it's very similar to what Google did for the Internet in terms of being available to catalog all of the webpages that were available to individuals and service them in meaningful ways. That's a huge vision for Alation, and we're just in the early part of that journey to be honest. We'll continue to move in that direction of being able to catalog data for an enterprise and make easily searchable, findable, and usable all of the information that is stored in that organization.

Listen to the podcast. Find it on iTunes. Get the mobile app. Read a full transcript or download a copy. Sponsor: Hewlett Packard Enterprise.

You may also be interested in:

Friday, June 3, 2016

Catbird CTO on why new security models are essential for highly virtualized data centers

The next BriefingsDirect Voice of the Customer discussion explores how increased virtualization across data centers translates into the need for new hybrid-computing approaches to security, compliance, and governance.

Just as next-generation data centers and private clouds are gaining traction, security threats are on the rise -- and attack techniques are becoming more sophisticated.

Are yesterday’s perimeter-based security infrastructure methods up to the task? Or are new approaches needed to gain policy-based control over all virtual assets at all times?

Listen to the podcast. Find it on iTunes. Get the mobile app. Read a full transcript or download a copy.

To explore the future of security for virtual workloads, we're joined by Holland Barry, CTO at Catbird in Scotts Valley, California. The discussion is moderated by me, Dana Gardner, Principal Analyst at Interarbor Solutions.

Here are some excerpts:

Gardner: Tell us why it’s a different picture nowadays when we look at data centers and private clouds. Oftentimes, people think similarly about security -- just wrap a firewall around it and you're okay. Why isn’t that the case? What’s new?

Barry
Barry: As we've introduced many layers of abstraction into the data center, trying to adapt those physical appliances that don’t move around as fluid as the workloads they're protecting, it has become an issue. And as people virtualize more and we go more to this notion of a software-defined data center (SDDC), it has just proven a challenge to keep up, and we know that that layer on the perimeter is probably not sufficient anymore.

Gardner: It also strikes me that it’s a moving target, virtual workloads come and go. You want elasticity. You want to be able to have fit-for-purpose infrastructure, but that's also a challenge when you can’t keep track of things and therefore secure them. 

Barry: That’s absolutely right. The transient nature of workloads themselves make any type of rigid enforcement from a single device pretty tough to deal with. So you need something that was built to be fluid alongside those dynamic workloads.

Gardner: And I suppose, too, that enterprise architects that are putting more virtualization together across the data center, the SDDC, aren’t always culturally aligned with the security folks. So you have more than just a technology issue here. Tell us what Catbird does that goes beyond just the technology, and perhaps works toward a cultural and organizational benefit?

Greater skill set

Barry: Even just from an interface standpoint or trying to create a tool that can cater to those different administrative silos, you have people who have virtualization expertise, compute expertise, and then different security practice expertise. There are many slim lanes within that security category, and the next generation set of workloads in the hybrid IT environment is going to demand more of a skill set that can span all those domains. 

Gardner: We talk a lot about DevOps and SecOps combining. There's also this need for automation and orchestration. So policy-based seems to be really the only option to keep up with the speed on security.
Learn How
Cloud Protection Starts
With a Security-First Mindset
Barry: That’s exactly right. There has to be an application-centric approach to how you're applying security to your workloads. Ideally that would be something that could be templatized or defined up front. So as new workloads present themselves in the network, there's already a predetermined way that they're going to be secured and that security will take place right up against the edge of that workload.

Gardner: Holland, tell us about Catbird, what you do, how you're deployed, and how you go about solving some of these challenges.
Having that single point of policy definition and enforcement is going to be critical to people adopting and really taking the next leap to put a layer of defense in their data center.

Barry: Catbird was born and raised in virtualized environments. We've been around for a number of years. It was this notion of bringing the perimeter and the control landscape closer to the workload, and that’s via hypervisor integration and also via the virtual data-path integration. So it's having a couple of different vantage points from within the fabric and applying security with a purpose-built solution that can span multiple platforms.

So that hybrid IT environment, which is becoming a reality, may have a little bit of OpenStack, it may have a little bit of VMware. Having that single point of policy definition and enforcement is going to be critical to people adopting and really taking the next leap to put a layer of defense in their data center.

Gardner: How are you deployed, you are a software appliance yourself, virtualized software?

Barry: Exactly right. Our solutions are comprised of two components, and it’s a very basic hub-and-spoke architecture. We have a policy enforcement point, a virtual machine (VM) appliance that installs out on each hypervisor, and we have a management node that we call the Control Center. That’s another VM, and those two components talk together in a secure manner. 

Gardner: What’s a typical scenario? Where in this type of east-west traffic virtualization environment, security works better and how it protects? Are there some examples that would demonstrate where the perimeter approach breaks down would but your model got the task done?

Doing enforcement

Barry: I think that anytime that you need to have the granularity of not only visibility, but enforcement -- I'm going to get a little technical here -- down to the UUID of the vNIC, that smallest unit of measure as it relates to a workload, that’s really where we shine, because that’s where we do our enforcement. 

Gardner: Okay. How about partnerships? Obviously you're working in an environment where there are a lot of different technologies, lots of moving parts. What’s going on with you and HPE in terms of deployment, working with private cloud, operating systems, and then perhaps even moving toward modeling and some of the HPE ArcSight technology?

Barry: We have a number of different integration points inside HPE’s portfolio. We're a Helion-ready certified partner. We just announced our support for the 2.0 Helion OpenStack release.
Learn How
Cloud Protection Starts
With a Security-First Mindset
We're doing a lot of work the ArcSight team in terms of getting very detailed event feeds and visibility into the virtualized workloads.

And we just announced some work that we are doing with HPE’s HPN team around their software-defined networking (SDN) VAN Controller as well, extending Catbird’s east-west visibility into the physical domain, leveraging the placement of the SDN controller and its command over the switches. So it’s pretty exciting work there.

Gardner: Let’s dig into that a bit, the (SDN) advances that are going on and how that’s changing how people think about deployment and management of infrastructure and data centers. Doesn’t this really give you some significant boost in the way that you can engage with security, intercept and stop issues before they propagate? What is it about SDN that is good for security?
Knowing the state of the workload, is going to be critical to applying those traditional security controls.

Barry: As the edges of what has traditionally been rigid network boundaries become fluid as well, knowing the state of the network, knowing the state of the workload, is going to be critical to applying those traditional security controls. So we're really trying to tie all this together -- not only with our integration with Helion, but also utilizing the knowledge that the SDN Controller has of the data path. We can surface indications that compromise and maybe get you to a problem a little bit quicker than traditional methods.

Gardner: I always like to try to show and not just tell. Do you have any examples of organizations that are doing this, what it has done for them, and why it’s a path to even greater future benefits as they further virtualize and go to even larger hybrid environments?

Barry: Absolutely. I can’t name them by name, but one of the US’ largest carriers telcos is one of our customers. They came to us to solve a problem of that consistency of policy definition and enforcement across those hybrid platforms. So it’s amongst VMware and OpenStack workloads.

That's not only for the application of the security controls and not only for the visibility of the traffic, but also the evidence of assurance of compliance, being able to do mapping back to regulatory frameworks and things like that.

Agentless fashion

There are a couple of different use cases in there, but it’s really that notion where I can do it in an agentless fashion, and I think that’s an important thing to differentiate and point out about our solution. You don’t have to install an agent within the workload. We don’t require a presence inside the OS.

We're doing it just outside of the workload, at the hypervisor level. It’s key that we have the specific tailored integrations to the different hypervisor platforms, so we can abstract away the complexity of applying the security controls where you just have a single pane of glass. You define the security policy and it doesn’t matter which platform you're on, it’s going to be able to do it in that agentless fashion.
We're aware of it, and I think our method of doing the security control application is going to be the one that wins.

Gardner: Of course, the march of technology continues, and we're not just dealing with virtualization. We're now talking about containers, micro-services, composable infrastructure. How will your solution, in conjunction with HPE, adapt to that, and is there more of a role as you get closer to the edge, even out into the Internet of Things (IoT), where we're talking about all sorts of more discrete devices really extending the network in all directions?

Barry: As the workload types proliferate and we get fancier about how we virtualize, whether it’s using a container or a virtualization platform, and then the vast amount of IoT devices that are going to present themselves, we're working closely with the HPE team in lockstep as mass adoption of these technologies happens.
Learn How
Cloud Protection Starts
With a Security-First Mindset
We have plans in place to solve platform by platform, and we believe taking an approach where we're looking at that specific problem and asking how we're going to attack this thing while keeping that bigger vision of, "We're still going to keep you in that same console and the method in which you apply the security is going to be the same."

Containers are a great example, something that we know we need to tackle, something that’s getting adopted in a fashion far more than I have ever seen with anything else. That’s a pretty exciting one. But at the end of the day, it’s a way of virtualizing a service or micro-services. We're aware of it, and I think our method of doing the security control application is going to be the one that wins.

Gardner: Pretty hard to secure a perimeter when there really isn’t a perimeter.

Barry: Perimeter is quickly fading, it seems.

Listen to the podcast. Find it on iTunes. Get the mobile app. Read a full transcript or download a copy. Sponsor: Hewlett Packard Enterprise.

You may also be interested in:

Thursday, June 2, 2016

Why business apps design must better cater to consumer habits to improve user experience

The next BriefingsDirect technology innovation thought leadership discussion focuses on new user experience demands for applications, and the impact that self-service and consumer habits are having on the new user experience design.

As more emphasis is placed on user experiences and the application of consumer-like processes in business-to-business (B2B) commerce, a softer side of software seems to be emerging. We'll now explore a new approach to design that emphasizes simple and intuitive process flows.

Listen to the podcast. Find it on iTunes. Get the mobile app. Read a full transcript or download a copy.

To learn more about how business apps design must better cater to consumer habits to improve user experience, we're joined by Michele Sarko, Chief Design Officer at SAP Ariba. The discussion is moderated by me, Dana Gardner, Principal Analyst at Interarbor Solutions.

Here are some excerpts:

Gardner: There seems to be a hand-off between the skills that are new to apps' user interface design versus older skills that had a harder edge from technology-centric requirements. Are we seeing a shift in the way that software is designed, from a user-experience perspective, and how different is it from the past?

Sarko: It’s more about understanding the end users first. It’s more about empathy and universal design. What used to happen was that technology was so new that we as designers were challenging it do things it didn’t do before. Now, technology is the table stakes from which everything is measured, and designers -- and our users for that matter -- expect it to just work.

Sarko
The differentiator now is to bring the human element into enterprise products, and that’s why there's a shift happening in software. The softer side of this is happening because we're building these products more for the people who actually use them, and not just for the people who buy them.

Gardner: We've heard from some discussions at the SAP Ariba LIVE Conference recently about the need for greater and more rapid adoption and getting people more deeply into business networks and applications. It seems to me that this user experience and that adoption relationship are quite closely aligned.

Sarko: Yes, they absolutely are, because at the end of the day, it’s about people. When we're selling consumer software or enterprise software or any types of business software, if people don't use it or don’t want to use it, you're not going to have adoption. You don’t want it to become “shelfware,” so to speak. You want to make a good business investment, but you also want your end users to be able to do it effectively. That’s where adoption comes into play and why its key to our customers as well as our own business.

Intuitive approach

Gardner: Another thing we heard was that people don't read the how-to manuals and they don't watch the videos. They simply want to dive in and be able to work and proceed with apps. There needs to be an intuitive approach to it.

I'm old enough to remember that when new software arrived in the office, we would all get a week of training and we'd sit there for hours of training. But no more training these days. So how do people learn to use new software?

Sarko: First and foremost, we need to build it intuitively, so that you naturally apply the patterns that you have to that software, but we should come about it in a different way, where training is in context, in product.

We're doing new things with overlays. and to take users through a tour, or step them through a new feature, to give them just the quick highlights of where things are. You see this sort of thing in mobile apps all the time after you install an update. In addition to that, we build in-context questions or answers right there at the point of need, where the user is likely to encounter something new or initially unknown in the the product.

So it’s just-in-time and in little snippets. But underpinning all of it, the experience has to be very, very simple, so that you don't have to go through this overarching hurdle to understand it.
We can keep those two things separate, making us able to iterate a lot faster. That's enabling us to go quicker and to understand users’ needs.

Gardner: I suppose, too, that there's an enterprise architectural change afoot. Before, when we had packaged software, the cycles for changing that would be sometimes years, if not more. Nowadays, when we go to cloud and software-as-a-service (SaaS) applications, where there’s multitenancy, and where the developer, the supplier of the software, can change things very rapidly, a whole new opportunity opens up. How does this new cloud architecture model benefit the user experience, as compared to the architecture of packaged software?

Sarko: The software and the capabilities that we're using now are definitely a step forward. With SAP Ariba, we’ve been able to decouple the application in the presentation layer in such a way that we can change the user experience more rapidly, do A/B testing, do a lot of in-product metrics and tracking, and still keep all of the deep underpinnings and the safety and security right there.

So we don't have to spend all of our time building it deep into the underpinnings. We can keep those two things separate, making us able to iterate a lot faster. That's enabling us to go quicker and to understand users’ needs.

Gardner: The drive to include mobile devices with any software and services now plays a larger role. We saw some really interesting demos at the SAP Ariba LIVE conference around the ability to discover and onboard a vendor using a mobile device, in this case a smartphone. How is the drive for mobile-first impacting this?

Sarko: Well, the mobile-first mindset is something that we always employ now. This is the way that we should, and do, design a lot of things, because it puts on a different set of restraints, form factors, and simplicity. On mobile, you only have so much real estate with which to work. Approaching it from that mindset allows us to take the learning that we do on mobile and bring them back to all the other device options that we have.

Design philosophy

Gardner: Tell me a little bit about your philosophy about design. When you look at software that maybe has years of a legacy, the logic has been there for quite some time, but you want to get this early adoption, rapid adoption. You want a mobile-first mentality. How do you approach this from a design philosophy point of view?

Sarko: It has to be somewhat pragmatic, because you can't move the behemoth of the company that you are to something different. The way that I approach it, and that we’re looking at within SAP Ariba, is to consider new ways to improve and new innovations and start there, with the mobile-first mindset, or really by just redesigning aspects of the product.

At the same time, pick the most important aspects or areas of your current product suite and reinvent those. It may take a little more time or it may be on a different technology stack. It may be inconsistent for a while, but the improvements are going to be there and are will outweigh that inconsistency. And then as we go, over time, we'll make that process change overall. But you can’t do it all at once. You have to be very pragmatic and judicious about where you start.

Gardner: Of course, as we mentioned earlier, you can adjust as you go. You have more opportunity to fix things or adjust the apps and design.
As a user, you’re never alone. We see countless other users facing the same challenges as you, with the same needs and expectations.

You also said something interesting at SAP Ariba LIVE, that designers should, “Know your users better than they know themselves.” First, what did you mean by that in more detail; and then secondly, who are the users of SAP Ariba applications and services, and how are they different from users of the past?

Sarko: What I meant by “know the users better than they know themselves” is that we're observing them, we're listening to them, we're drawing patterns across them. The user may know who they are, but they often feel like they may be alone. What we end up seeing is that as a user, you’re never alone. We see countless other users facing the same challenges as you, with the same needs and expectations.

You may just be processing invoices all day, or you may be the IT professional that now has to order all of the equipment for your organization. We start to see you as a person and the issues that you face, but then we start to figure out how we help not only you in your specific need, but we learn from others about new features and requirements that you didn't even think you might need.

So, we're looking in aggregate to find out solutions that would fit many and give it to all rather than just solve it one by one. That's what I mean by, "know your users better than they know themselves."

And then who are the users? There are different personas. Historically, SAP Ariba focused mostly only on the customer, the folks who made the purchasing decisions, who owned the business decisions. I'm trying to help the company understand that there is a shift, that we also have to pay equal attention to the end users, the people who are in the product using it everyday. As a company, SAP Ariba has to focus on the various roles and satisfy both needs in order for it to be successful.

Gardner: It must be difficult to create software for multiple roles. You mentioned the importance of being role-based in this design process. Is it that difficult to create software that has a common underpinning in terms of logic, but then effectively caters to these different roles?

Design patterns

Sarko: The way that we approach it is through building blocks and systems. We have design patterns, which are building blocks, and these little elements then get manifested together to build the experience.

Where the roles come in is what gets shown or not. Different modules may be exposed with those building blocks to one group of people, but not to the other. Based on roles and permissions, we can hide and show what’s needed. That’s how we approach the role-based design and make it right for you.

Gardner: And I suppose too one of the goals for SAP Ariba is to not just have the purchasing people do the purchasing, but have more people, more self-service. Tell me a bit more about self-service and this idea that people are shopping and not necessarily procuring.

Sarko: Yes, because this is really the shift that we're trying to communicate design for. We come to work every day with our biases from our personal lives, and it really shouldn't be all that different when talking about procurement. I mentioned earlier that this is not really about procurement for end users; it’s about shopping, because that's what you're doing when you buy things, whether you’re buying them for work or for your personal life.
The terminology has to be consistent with what we know from our daily lives and not technical jargon. Bringing those things to bear and making that experience much more consumer-like will enable our customers to be more successful.

The terminology has to be consistent with what we know from our daily lives and not technical jargon. Bringing those things to bear and making that experience much more consumer-like will enable our customers to be more successful.

Gardner: We've already seen some fruits of these labors and ideas. We saw an example of Guided Buying, a really fresh, clean interface, very similar to a business-to-consumer (B2C) shopping experience. Tell me a little bit about some of the examples we have seen and how far we are along the spectrum to getting to where you want to go.

Sarko: We're very far down the path of building this out. We've been spending the past six months developing and iterating on ideas, and we'll be able to market the first release relatively soon.

And through the process of exploration and working with customers, there have been all of kinds of nuances about policy compliance and understanding what’s allowed and what’s not allowed. And not just for the end user, but for the procurement professional, for the buyer in their specific areas, in addition to for the procurement folks behind the scenes. All of these roles now are thought of as individual players in an orchestra, because they all have to work together. We're actually quite far along, and I'm really excited to see the product come to market pretty soon.

Gardner: Any other ideas about where we go when we start bringing more reactions to what users are doing in the software? We saw instances where people were procuring things, but then the policy issue would pop-up, the declaration of, "That's not within our rules; you can’t do that."

It seems to me that if we take that a step further, we're going to start bringing in more analysis and say, "Well, you're going down this path, but we have information that could help you analyze and better make a decision." Is that something we should expect soon as well?

Better recommendations

Sarko: Yes, absolutely. We're trying to use the intelligence that we have to make better recommendations for the end users. Then, when the policy compliance comes in, we're not preventing the end user from completing their task. We're just bringing in the policy person at the other end to help alleviate that other approval, so that the users still accomplish what they started to do.

Gardner: We really are on the cusp of an interesting age, where analysis from deep-data access and deep-penetrating business intelligence types of inserts can be made into process. We're at the crossroads of process and intelligence coming together.

Before we sign off, is there anything else we should expect in terms of user experience, enhancements in business applications, particularly in the procure-to-pay process?

Sarko: This is an ongoing evolutionary process. We learn from the users each day with multiple inputs: talking to them, watching analytics, listening to customer support. The product is only going to get better with the feedback that they give us.
We're listening, learning, reacting, much more quickly than we have before. I expect that you'll see many more product changes and from all of the feedback, we’ll make it better for everyone.

Also, our release cycles now have gone from 12 to 18 months down to three months, or even shorter. We're listening, learning, reacting, much more quickly than we have before. I expect that you'll see many more product changes and from all of the feedback, we’ll make it better for everyone.

Gardner: Speaking of feedback, I was very impressed with the Feature Voting that you've instituted, allowing people to look at different requirements for the next iteration of the software and letting them vote for their favorites. Could just add a bit more about how that might impact user experience as well?

Sarko: By looking holistically at all the feedback we get, we start to see trends and patterns of the things we're getting a lot of traction on or a lot of interest in. That helps us prioritize what we call a backlog -- the feature list -- so that based on user input, we attack the areas that are most important to users and work that way.

We listen to the input, every single piece of it. Also, as you heard from last year, we launched Visual Renewal. In the product when you switch versions of the interface, you see a feedback form that you can fill out. We read every piece of that feedback. We're looking for trends about how to fix the product and make enhancements based on users. This is an ongoing process that we'll continue to do: listen, learn, and react.

Listen to the podcast. Find it on iTunes. Get the mobile app. Read a full transcript or download a copy. Sponsor: SAP Ariba.

You may also be interested in:

Thursday, May 12, 2016

Playtika bets on big data analytics to deliver captivating social gaming experiences

The next BriefingsDirect Voice of the Customer discussion explores how social gaming company Playtika uses big-data analytics to deliver captivating user experiences and engagement.

We'll learn how feedback from massive user action streams can be analyzed in bulk rapidly to improve the features and attractions of online games and so help Playtika react well in an agile market.

Listen to the podcast. Find it on iTunes. Get the mobile app. Read a full transcript or  download a copy.

To learn more about leveraging big data in the social casino industry, we're pleased to welcome Jack Gudenkauf, Vice President of Big Data at Playtika in Santa Monica, California. The discussion is moderated by me, Dana Gardner, Principal Analyst at Interarbor Solutions.

Here are some excerpts:

Gardner: I understand that you're part of Caesars Interactive Entertainment and that you have a number of online games. Tell us about Playtika.

Gudenkauf: We have a few free-to-play social casino games. In fact, we're the industry leader. We have maybe 10 games at this point. World Series of Poker, which you've probably heard about, Slotomania, House of Fun, Bingo Blitz, a number of studios combined.

Gudenkauf
Worldwide, we're about 1,000 employees. As I say, we're the industry leader in this space at this moment. And it's a very challenging space, as you might imagine, just within gaming itself. The amount of data is huge, especially across all of these games. Collecting information about how the users play the game and what they like about the game, is really a completely data-driven experience.

If we release a new feature, we get feedback. Of course, it’s social gaming as well. If we find out that they don't like the feature, we have to rev the game pretty quickly. It's not like the old days, where you go away for a year or so, and come out with something that you hope people like -- Halo, or something like that. It's more about the users driving the experience and what they enjoy.

So we'll try something with some content or something else and see if they like this feature or functionality. If the data comes back immediately that, as they do the slot spin and they have a new version of the game and they're clearly not playing, we literally change the game.

In fact, in the Bingo Blitz game, we will revise the game as often as once a week, if you can imagine that. So we have to be pretty agile. The data completely drives the user experience as well. Do they like this, do they not like this, shall we make this game change?

Data-driven environment

It’s a complete data-driven environment. That's what brought me there. I came from Twitter, where we used very big data, as you might imagine, with Hewlett Packard Enterprise (HPE) Vertica and Hadoop and such, but it was more about volume there. Here it’s about variety, velocity, and changing game events across all of our studios.

You can imagine the amount of data that we have to crunch through, do analytics on, and then get user feedback. The whole intention is to get feedback sooner so that we can change the game as rapidly as possible, so that users are happy with the game.
HPE Vertica
Community Edition
Start Your Free Trial Now
So it’s completely user-driven as far as kind of the experience and what they enjoy, which is fun and makes it challenging as well.

Gardner: So being a data scientist in this particular organization gives you a pretty important place at a major table. It's not something to think about at the end of the month when we run some reports. This is essential and integral to the success of the company?

Gudenkauf: Of course, we do analyze the data for daily, monthly, and general key performance indicators (KPIs), daily active users or monthly active users, those types of things. But you're absolutely right. With the game events themselves, we need to process the data as quickly as possible and do the analysis. So analytics is a huge part of our processing.
With the user experience and what they enjoy and the free to play, in particular, the demand is pretty high.

We actually have a game economy as well, which is kind of fascinating. If you think of it in terms of the US economy, you can only have so much money in the economy without having inflation and deflation. Imagine if I won all the money and nobody else could have money to play with. It’s kind of game over for us, because they can’t play the game anymore. So we have to manage that quite well.

Of course, with the user experience and what they enjoy and the free to play, in particular, the demand is pretty high. It’s like with apps that you pay for. The 99-cent apps are the ones that people think the most about.

When somebody is spending a dollar, it's very important to them. You want the experience to be a great experience for them. So the data-driven aspects of that and doing the analysis and analytics of it, and feeding that back to the game is extremely important to us. The velocity and the variety of games and different features that we have and processing that as fast as possible is quite a challenge.

Gardner: Now, games like poker, slots, or bingo, these are games that have been around for decades, if not hundreds of years, and they've had a new life online in the past 15 years, which is the Dark Ages of online gaming. What's new and different about games now, even though the game is essentially quite familiar to people? What's new and different about a social casino game?

Social aspect

Gudenkauf: I've thought about that quite a bit. A lot of it has to do with the social aspect. Now, you can play bingo, not just with your friends at the local club, but you can play with people around the world.

You can share items and gifts, and if you are running low on money, maybe you can borrow some from your friends. And you can chat with them. The social aspect just opened up all kinds of avenues.

In our case, with our games in the studios, because they're familiar, they stand the test of time. Take something like a bingo or slots, as opposed to some new game that people don't really understand. They may like it. They may only like it for a while. It’s like playing Scrabble or Monopoly with your family. It's a game that's just very familiar and something you enjoy playing.

But, with the online and the social aspect of it, I explain it to other people as imagine Carmen Sandiego meets bingo. You can have experiences where you're playing bingo, you go on this journey to Egypt, and you're collecting items and exploring Egypt, trying to get to another thing. We can take it to places that you wouldn't normally take a traditional kind of board game and in a more social aspect.
So you extract that data as usual and then you transform it. You reshape it and change it around a little bit to put it in a format to get it into a data warehouse like Vertica.

Gardner: So this really appeals to what's conceived of as entertainment in multiple ways for an individual. Again, as you established, the analysis and feedback loops are really important.

I understand why doing great data analysis is so important to this particular use case. Tell us a little bit about how you pull that off. What sort of data architecture do you have? What sort of requirements do you have? What are the biggest problems you have to overcome to achieve your goals?

Gudenkauf: If you think about the traditional way of consuming data and getting it into a reporting system, you have an extract. You're going to bring in data from somewhere, and of course, in our case it’s from mobile devices, the web, from playing on Facebook. You have information about how much money did you spend, and user behavior. Did they like it?

So you extract that data as usual, and then you transform it. You reshape it and change it around a little bit to put it in a format to get it into a data warehouse like Vertica.

Once you get it into HPE Vertica, you have the extract, transform, and the load (ETL), the traditional model. You load it into Vertica and then you do your analysis there, where you can do SQL, JOINs, and analytics over it.

A new industry term that I'm coining is what we call Parallelized Streaming Transformation Loader (PSTL) instead of ETL. This is about ingesting data as fast as possible, processing it, and making analytics available through the entire data pipeline, instead of just in the data warehouse.

Real-time streaming

Imagine, instead of the extract, we're taking real-time streaming data. We're reading, in our case, off a Kafka queue. Kafka is very robust and has been used by LinkedIn and Twitter. So it’s pretty substantial and scalable.

We read the messages in parallel as they're streaming in from all the game studios, certain amounts of data here and there, depending on how much we do with the particular studio. With Bingo Blitz, in our case, we consume a lot more user behavior than say some of the other studios.

But we ingest all the data. We need to get it in in real-time streaming. So we read it in in parallel. That’s the parallel part and the streaming part. But then we take it from the streaming, and instead of extracting, it's being fed into us.

Then we do parallel transformations in Spark and our Hadoop cluster. Think of it as  bringing in a bunch of JSON event data, we are putting it into an in-memory table that’s distributed in Spark.
HPE Vertica
Community Edition
Start Your Free Trial Now
Then, we do parallel transformations, meaning we can restructure the data, we can do transforms from uppercase, lowercase, whatever we need to do. But it's done in parallel across the cluster as well. Where, traditionally, there's a single monolithic app that was running, we could run independent to the extract of the load.

We have so much data that we need to also do the transformations in parallel. We do that in what are called Resilient Distributed Datasets (RDDs). It’s kind of a mouthful, but think of it as just a bunch of slices of data across a bunch of computers and your nodes, and then doing transforms on that in parallel. Then, something that has been a dream of mine is how to get all that data in parallel at the same time into HPE Vertica.

HPE Vertica does a great job of doing massive parallel processing (MPP) and all that means is running the query and pulling data off of different nodes in the cluster. Then, maybe you're grouping by this and you are summing this and doing an average.

But, to date they hadn't had something that I tried to do when I was at Twitter, but managed to pull off now, which is to load the data in parallel. While the data is in memory in Spark and distributed datasets, we use the Vertica Hash function that will tell us exactly where the data will land when we write it to a Vertica node.

We can say, User A, if I were to write this to Vertica, I know that it’s going to go on this machine. User B will go to the next machine. It just distributes the load, but we, a priori, hash the data into buckets, so that we know, when we actually write the data, that it goes to this node. Then, Vertica doesn’t have to move it. Usually you write it to one node and it says, "No, you really belong over here," and so it asks you to move it and shuffle, like a traditional MapReduce.

Working with Vertica

So we created something in conjunction with the Vertica developers. We announced it. That part of it is kind of a TCP server aspect that we extend in the Copy command that exist in Vertica itself. We literally go from streaming in parallel, reading into in-memory data structures, do the transformations, and then write it directly from memory into our Vertica data warehouse.

That allows us to get the data in as fast as possible from streaming right to the right. We don’t have to hit a disk along the way and we can do analytics in Vertica sooner. We can also do analytics in Hadoop clusters for older data and do machine learning on that. We can do all kinds of things based on historical user behavior.

If we're doing a sale or something like that, how well is it resonating compared to the past. What we're doing is pushing the envelope to push the analytics as close as we can up to the actual game itself.

As I said, traditionally, you do the analytics, get the feedback, change the game, release it in a week, etc. We're going to try to push that all the way up to be as near real time as we can. Basically, the PSTL pipeline, allows us to do that, do analytics, and tighten that loop down so that we can get the user behavior to the user as fast as possible.
Once you have it in as fast as you can, reshaping it while it’s in memory, which of course is faster, and taking advantage of doing the parallel transformations at the same time, and in the parallel loading as well, it’s just a way more optimized solution.

Gardner: It’s intriguing. It sounds as if you're able, with a common architecture, to do multiple types of analysis readily but without having to reshuffle the deck chairs each time. Is that fair?

Gudenkauf: That's exactly right. That’s the beauty of this model and why I'm putting up more prescriptive guidance around it. It changes the paradigm of the traditional way of processing data.

We announced some benchmarking. Last year at the HPE Big Data Conference, Facebook stole the show with 36 terabytes an hour on 270 machines. With our model, you could do it with about 80 machines. So it scales very well. Some people say, "We're not Twitter or Facebook scale, but the speed at which we want to consume the data and make it available for analytics is extremely important to us."

The less busy the machines are, the more you can do with them. So does it need to scale like that? No, we are not processing as much data, but the volume, velocity, and variety is a big deal for us. We do need to process the volume, and we do have a lot of events. The volume is not insignificant. We're talking about billions of events, mind you. We're not on the sheer scale of say Twitter or Facebook, but the solution will work for both, in both scenarios.

Gardner: So, Jack, with this capability analysis as close to real-time with the volume and the variety that you are able to accomplish, while this is a great opportunity for you to react in a gaming environment. you're also pushing the envelope on what analysis and reaction can happen to almost any human behaviors at scale. In this case, it happens to be gaming, but there are probably other applications for this. Have you thought about that or are there other places you can take it within an interactive entertainment environment?

All kinds of solutions

Gudenkauf: I can imagine all kinds of solutions for it. In fact, I've had a number of people come up to me and say, "We're doing this Chicago Stock Exchange, and we have a massive amount of streaming-in data. This is a perfect solution for that."

I've had other people come in to talk to me about other aspects and other games as well that are not social casino genre, but they have the same problem. So it's the traditional problem of how to ingest data, massage it, load it, and then have analytics through that entire process. It’s applicable really in any scenario. That’s one of the reasons I'm so excited about the PSTL model, because it just scales extremely well along the way.

Gardner: Let’s relate this back to this particular application, which is higher entertaining games that react, and maybe even start pushing envelope into anticipating what people will want in a game. What’s the next step for making these types of games engaging? I'm even starting to toy with the concept of artificial intelligence (AI), where people wouldn’t know that it’s a game. They might not even know the difference between the game and other social participants. Are we getting anywhere close to that?
Looking at historical data and doing machine learning, we can make better determinations of games and game behavior.

Gudenkauf: You're thinking extremely clearly on the spectrum in analytics in general. Before, it was just general reporting in the feedback loop, but you're absolutely right. As you can see, it’s enabled through our model of prescriptive analytics. Looking at historical data and doing machine learning, we can make better determinations of games and game behavior that will drive the game based on historical knowledge or incoming data that’s more predictive analytics.

Then, as you say, maybe even into the future, beyond predictive and prescriptive analytics, we can almost change as rapidly as possible. We know the user behavior before the user knows the behavior. That will be a great world, and I'm sure we would be extremely successful to get to that final spectrum. But just doing the prescriptive analytics alone, so that the user is happy with the game, and we can get that back to them as quickly as possible, that’s big in and of itself.

Gardner: So maybe a new game some day will be pass the Turing Test, you against our analysis capabilities?

Gudenkauf: Yeah, that would be pretty cool. Maybe eventually it will tie into the whole virtual reality. It’s kind of happening based on the information behaviors immediately. That will be neat.

Gardner: Very exciting world coming our way, right? We're only scratching the surface. I guess I have run out of questions because my mind is reeling at some of these possibilities.

One last area though. For a platform like HPE Vertica, what would you like to see them do intrinsic to the product? We have the announcement recently about the next version of Vertica, but what might be on your list, a wish-list if you will, for what should be in the product to allow this sort of thing to happen even more readily?

Influencing the product

Gudenkauf: That’s one of the reasons we go to conferences. It’s one of the few conferences where you can get to the actual developers or professional services and influence the product itself.

One of the reasons why I like to be on the leading edge or bleeding edge is so that we can affect product development and what they are working on. I've been fortunate enough to be able to work with developers and people internal to HPE Vertica for quite a while now. I just love the product and I want to see it be successful. With the adoption and their more openness of  working with open source like Spark and MapReduce, the whole ecosystem works well together, as opposed to opposing each other, which I think is what most people think. It’s a very collaborative, cooperative environment especially through our pipeline.

I really like the fact that when I talk about things like Kafka and the PSTL, and that Spark is a core part of our architecture, now we're having conversation, and lots of them, to help Vertica and influence them to invest more in Spark and the interaction between Vertica data warehouse, Spark, and that eco-system from Kafka.
One of the reasons why I like to be on the leading edge or bleeding edge is so that we can affect product development and what they are working on.

From the part of the work that we did with Vertica over the last year with reading streaming data from Kafka into Spark, of course, and then into Vertica, they said that  reading real-time streaming data from Kafka directly into HPE Vertica will be a great add-on  and they announced it. Ben Vandiver and developers announced it.

I really want to be in a place, and this affords us to be in that place, to influence where they are going, because it benefits all of us and the entire community. It's being able to give them prescriptive guidance as well from the customer perspective, because this is what we're doing in the real world, of course. They want to make us happy, and we will make them happy.

Our investments have been in things like Kafka streaming and Spark and how does Spark SQL work with Vertica and VSQL. They don’t necessarily have to compete. There is a world for both. So coexisting, influencing that, and having them be receptive to it is amazing. A lot of companies aren’t very receptive to taking the feedback from us as consumers and baking that into offerings.
HPE Vertica
Community Edition
Start Your Free Trial Now
One of the things in our model to load the data as fast as possible in parallel is that we pre-hash the data. If you just take user IDs, for instance, and you hash on those IDs, so that you can put this user on this node, and this one on this one and this one, is an even distribution of data, that wasn’t exposed in Vertica. I've been asking for it since the Twitter days for years.

So we wrote our own version of it. I managed to have the Vertica developers, which is a rare and a great opportunity, review what we had done. They said, "Yes, that’s spot on. That’s exactly the implementation." I said, "You know what would be even better.

I've been asking for this for years and I know you have lots of other customers. Why don’t you just make it available for everybody to use. Then, I don’t have to use mine and everybody else can benefit from it as well. They announced in 2015 that they're going to make it available. So being able to influence things like that just helped the whole ecosystem.

Listen to the podcast. Find it on iTunes. Get the mobile app. Read a full transcript or  download a copy. Sponsor: Hewlett Packard Enterprise.

You may also be interested in: