We'll also delve into how the right balance between open-source and commercial IT products helps in creating a big-data capability, and we'll further explore how converged infrastructure solutions are hastening the path to big-data business value and cloud deployment options.
Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Read a full transcript or download a copy.
To learn more about how big data can be harnessed for analysis benefits in healthcare and retail, please join me in welcoming our guests, Dennis Faucher, Enterprise Architect at Rolta AdvizeX, and Raajan Narayanan, Data Scientist at Rolta AdvizeX. The discussion is moderated by me, Dana Gardner, Principal Analyst at Interarbor Solutions.
Here are some excerpts:
Gardner: Dennis, what makes big data so beneficial and so impactful for the healthcare and retail sectors?
We're finding that the most successful healthcare and retail organizations are now making real-time decisions based upon the data that's coming in every second to their organization.
Gardner: So it's more, faster, and deeper, but is there anything specific about healthcare, for example? What are some top trends that are driving that?
Two sides of healthcare
Faucher: You have two sides of healthcare, even if it's a not-for-profit organization. Of course, they're looking for better care for their patients. In the research arms of hospitals, the research arms of pharmaceutical companies, and even on the payer side, the insurance companies, there is a lot of research being done into better healthcare for the patient, both to increase people's health, as well as to reduce long-term costs. So you have that side, which is better health for patients.
On the flip side, which is somewhat related to that, is how to provide customers with new services and new healthcare, which can be very, very expensive. How can they do that in a cost-effective manner?
So it's either accessing research more cost-effectively or looking at their entire pipeline with big data to reduce cost, whether it's providing care or creating new drugs for their patients.
Gardner: And, of course, retail is such a dynamic industry right now. Things are changing very rapidly. They're probably interested in knowing what's going on as soon as possible, maybe even starting to get proactive in terms of what they can anticipate in order to solve their issues.
Faucher: There are also two sides to retail as well. One is the traditional question of, How can I replenish my outlets in real time? How can I get product to the shelf before it runs out? Then, there's also the traditional side of the cross-sell, up-sell, and what am I selling in a shopping cart, to try to get the best mix of products within a shopping cart that will maximize my profitability for each customer.
Those are the types of decisions our customers in retail have been making for the last 30-50 years, but now they have even more data to help them with that. It's not just the typical sales data that they're getting from the registers or from online, but now we can go into social media as well and get sentiment analysis for customers to see what products they're really interested in to help with stocking those shelves, either the virtual shelves or the physical shelves.
The second side, besides just merchandising and that market-basket analysis, is new channels for consumers. What are the new channels? If I'm a traditional brick-and-mortar retailer, what are the new channels that I want to get into to expand my customer base, rather than just the person who can physically walk in, but across many, many channels?
There are so many channels now that retailers can sell to. There is, of course, their online store, but there may be some unique channels, like Twitter and Facebook adding a "buy" button. Maybe they can place products within a virtual environment, within a game, for customers to buy. There are many different areas to add channels for purchase and to be able to find out real-time what are people buying, where they're buying, and also what they're likely to buy. Big data really helps with those areas in retail.
Gardner: Raajan, there are clearly some compelling reasons for looking at just these two specific vertical industries to get better data and be more data-driven. The desire must be there, even the cost efficiencies are more compelling than just a few years ago. What’s the hurdle? What prevents them from getting to this goal of proactive, and to the insights that Dennis just described?
Narayanan: One of the main challenges that organizations have is to use the current infrastructure for analytics. The three Vs: velocity, variety and the volume of data serve up a few challenges for organizations in terms of how much data I can store, where do I store it, and do I have the current infrastructure to do that?
In addition, there are lots of analytics tools out there. The ecosystem is growing by the day. There are a few hundred offerings out there and they are all excellent platforms to use. So the choice of what kind of analytics I need for the set purpose is the bigger challenge. To identify the right tool and the right platform that would serve my organization needs would be one of the challenges.
The third challenge would be to have the workforce or the expertise to build these analytics or have organizations to address these challenges from an analytical standpoint. This is one of the key challenges that organizations have.
Gardner: Dennis, as an enterprise architect at Rolta AdvizeX, you must work with clients who come at this data issue compartmentalized. Perhaps marketing did it one way; R and D did it another; supply chain and internal business operations may have done it a different way. But it seems to me that we need to find more of a general, comprehensive approach to big data analytics that would apply to all of those organizations.
Is there some of that going on, where people are looking not just for a one-off solution different for each facet of their company, but perhaps something more comprehensive, particularly as we think about more volume coming with the Internet of Things (IoT) and more data coming in through more mobile use? How do we get people to think about big-data infrastructure, rather than big-data applications?
Faucher: There are so many solutions around data analytics, business intelligence (BI), big data, and data warehouse. Many of them work, and our customers unfortunately have many of them and they have created these silos of information, where they really aren’t getting the benefits that they had hoped for.
What we're doing with customers from an enterprise architecture standpoint is looking at the organization holistically. We have a process called Advizer, where we work with a company, look at everything they're doing, and set a roadmap for the next three years to meet their short-term and long-term goals.
And what we find when we do our interviews with the business people and the IT people at companies is that their goals as an organization are pretty clear, because they've been set by the head of the organization, either the CEO or the chief scientist, or the chief medical director in healthcare. They have very clear goals, but IT is not aligned to those goals and it’s not aligned holistically.
There could be skunk works that are bringing up some big-data initiatives. There could be some corporate-sponsored big data, but they're just not organized. All it takes is for us to get the business owners and the IT owners in a room for a few hours to a few days, where we can all agree on that single path to meet all needs, to simplify their big data initiatives, but also get the time to value much faster.
That’s been very helpful to our customers, to have an organization like Rolta AdvizeX come in as an impartial third-party and facilitate the coming together of business and IT. Many times, as short as a month, we have the three-year strategy that they need to realize the benefits of big data for their organization.
Gardner: Dennis, please take a moment to tell us a little bit more about AdvizeX and Rolta.
Faucher: Rolta AdviseX, is an international systems integrator. Our US headquarters is in Independence, Ohio, just outside of Cleveland. Our international headquarters are in Mumbai, India.
As a systems integrator, we lead with our consultants and our technologists to build solutions for our customers. We don’t lead with products. We develop solutions and strategy for our customers.
There are four areas where we find our customers get the greatest value from Rolta AdvizeX. At the highest level are our advisory services, which I mentioned, which set a three-year roadmap for areas like big data, mobility, or cloud.
The second area is the application side. We have very strong application people at any level for Microsoft, SAP, and Oracle. We've been helping customers for years in those areas.
The third of the four areas is infrastructure. As our customers are looking to simplify and automate their private cloud, as well as to go to public cloud and software as a service (SaaS), how do they integrate all of that, automate it, and make sure they're meeting compliance.
The fourth area, which has provided a lot of value for our customers, is managed services. How do I expand my IT organization to a 7x24 organization when I'm really not allowed to hire more staff? What if I could have some external resources taking my organization from a single shift to three shifts, managing my IT 7x24, making sure it’s secure, making sure it’s patched, and making sure it’s reliable?
Those are the four major areas that we deliver as a systems integrator for our customers.
Gardner: Raajan, we've heard from Dennis about how to look at this from an enterprise architecture perspective, taking the bigger picture into account, but what about data scientists? I hear frequently in big data discussions that companies, in this case in healthcare and retail, need to bring that data scientist function into their organizations more fully. This isn't to put down the data analysts or business analysts. What is it about being a data scientist that is now so important? Why, at this point, would you want to have data scientists in your organization?
Narayanan: One of the key functions of a data scientist is to be able to look at data proactively. In a traditional sense, a data analyst's job is reflective. They look at transactional data in a traditional manner, which is quite reflective. Bringing in a data scientist or a data-scientist function can help you build predictive models on existing data. You need a lot of statistical modeling and a lot of the other statistical tools that will help you get there.
This function has been in organizations for a while, but it’s more formalized these days. You need a data scientist in an organization to perform more of the predictive functions than the traditional reporting functions.
Gardner: So, we've established that big data is important. It’s huge for certain verticals, healthcare and retail among them. Organizations want to get to it fast. They should be thinking generally, for the long term. They should be thinking about larger volumes and more velocity, and they need to start thinking as data scientists in order to get out in front of trends rather than be reactive to them.
So with that, Dennis, what’s the role of open source when one is thinking about that architecture and that platform? As a systems integrator and as enterprise architect, what do you see as the relationship between going to open source and taking advantage of that, which many organizations I know are doing, but also looking at how to get the best results quickly for the best overall value? Where does the rubber hit the road best with open source versus commercial?
Faucher: That’s an excellent question and one that many of our customers have been grappling with as there are so many fantastic open-source, big-data platforms out there that were written by Yahoo, Facebook, and Google for their own use, yet written open source for anyone to use.
I see a little bit of an analogy to Linux back in 1993, when it really started to hit the market. Linux was a free alternative to Unix. Customers were embracing it rapidly trying to figure out how it could fit in, because Linux had a much different cost model than proprietary Unix.
We're seeing that in the open-source, big-data tools as well. Customers have embraced open-source big-data tools rapidly. These tools are free, but just like Linux back then, the tools are coming out without established support organizations. Red Hat emerged to support the Linux open-source world and say that they would help support you, answer your phone calls, and hold your hand if you needed help.
Now we're seeing who are going to be the corporate sponsors of some of these open-source big data tools for customers who may not have thousands of engineers on staff to support open source. Open-source tools definitely have their place. They're very good for storing the reams and reams, terabytes, petabytes, and more of data out there, and to search in a batch manner, not real time, as I was speaking about before.
Some of our customers are looking for real-time analytics, not just batch. In batch, you ask a question and will get the answer back eventually, which many of the open-source, big-data tools are really meant for. How can I store a lot of data inexpensively that I may need access to at some point?
We're seeing that our customers have this mix of open-source, big-data tools, as well as commercial big-data tools.
I recently participated in a customer panel where some of the largest dot-coms talked about what they're doing with open source versus commercial tools. They were saying that the open-source tools was where they may have stored their data lake, but they were using commercial tools to access that data in real time.
They were saying that if you need real-time access, you need a big-data tool that takes in data in parallel and also retrieves it in a parallel manner, and the best tools to do that are still in the commercial realm. So they have both open source for storage and closed source for retrieval to get the real-time answers that they need to run their business.
Gardner: And are there any particular platforms on the commercial side that you're working with, particularly on that streaming, real-time, at volume, at scale equation?
Faucher: What we see on our side with the partners that we work with is that HP Vertica is the king of that parallel query. It’s extremely fast to get data in and get data out, as well as it was built on columnar, which is a different way to store data than relational is. It was really meant to get those unexpected queries. Who knows what the query is going to be? Whatever it is, we'll be able to respond to it.
Another very popular platform has been SAP HANA, mostly for our SAP customers who need an in-memory columnar database to get real-time data access information. Raajan works with these tools on a daily basis and can probably provide more detail on that, as well as some of the customer examples that we've had.
Gardner: Raajan, please, if you have some insight into what’s working in these verticals and any examples of how organizations are getting their big data payoff, I'd be very curious to hear that.
Narayanan: One of the biggest challenges is to be able to discover the data in the shortest amount of time, and I mean discovery in the sense that I get data into the systems, and how fast I can get some meaningful insights.
Works two ways
It works two ways. One is to get the data into the system, aggregate it into your current environment, transform it so that data is harmonious across all the data sources that provide it, and then also to provide analytics over that.
In a traditional sense, I'll collect tons and tons of data. It goes through reams and reams of storage. Do I need all that data? That's the question that has to be answered. Data discovery is becoming a science as we speak. When I get the data, I need to see if this data is useful, and if so, how do I process it.
These systems, as Dennis alluded to, Vertica and SAP HANA, enable that data discovery right from the get-go. When I get data in, I can just write simple queries. I don't need a new form of analytic expertise. I can use traditional SQL to query on this data. Once I've done that, then if I find the data useful, I can send it into storage and do a little bit more robust analytics over that, which can be predictive or reporting in nature.
A few customers see a lot of value in data discovery. The whole equation of getting in Hadoop as a data lake is fantastic, and these platforms play very well with the Hadoop technologies out there.
Once you get data into these platforms, they provide analytic capabilities that go above and beyond what a lot of the open-source platforms provide. I'm not saying that open source platforms don’t perform these functions, but there are lots of tools out there that you need to line up in sequence for them to perform what Vertica or SAP HANA will do. The use cases are pretty different, but nevertheless, these platforms actually enable lot of these functions.
Gardner: Raajan, earlier in our discussion you mentioned the importance of skills and being able to hire enough people to do the job. Is that also an issue in making a decision between an open-source and a commercial approach?
Narayanan: Absolutely. With open source, there are a lot of code bases out there that needs to be learned. So there is a learning curve within organizations.
Traditionally, organizations rely more on the reporting function. So they have a lot of the SQL functions within the organization. To retrain them is something that an organization would have to think about. Then, even to staff for new technologies is something that an organization would have to cater for in the future. So it’s something that an organization would have to plan in their roadmap for big-data growth.
Gardner: Dennis, we can back at the speed and value and getting your big data apparatus up and running, perhaps think about it holistically across multiple departments in your organization, and anticipate even larger scale over time, necessitating a path to growth. Tell us a little bit about what's going on in the market with converged infrastructure, where we're looking at very tight integration between hardware and software, between servers that are supporting workloads, usually virtualized, as well as storage also usually virtualized.
For big data, the storage equation is not trivial. It’s an integral part of being able to deliver those performance requirements and key performance indicators (KPIs). Tell us a bit about why converged infrastructure makes sense and where you're seeing it deployed?
Faucher: What we're seeing with our customers in 2015 is that they have three options for where to run their applications. They have what we call best-of-breed, which is what they've done forever. They buy some servers from someone, some storage from someone else, some networking from someone else, and some software from someone else. They put it together, and it’s very time-consuming to implement it and support it.
They also have the option of going converged, which is buying the entire stack -- the server, the storage, and the networking -- from a single organization, which will both factory integrate it, load their software for them, show up, plug it in, and you are in production in less than 30 days.
The third option, of course, is going to cloud, whether that’s infrastructure as a service (IaaS) or SaaS, which can also provide quick time to value.
For most of our customers now, there are certain workloads that they are just not ready to run in IaaS or SaaS, either because of cost, security, or compliance reasons. For those workloads that they have decided are not ready for Saas, IaaS, or platform as a service (PaaS) yet, they need to put something in their own data center. About 90 percent of the time, they're going with converged.
Beside the fact that it’s faster to implement, and easier to support, our customers’ data centers are getting so much bigger and more complex that they just cannot maintain all of the moving parts. Thousands of virtual machines and hundreds of servers and all the patching needs to happen, and keeping track of interoperability between server A, network B, and storage C. The converged takes that all away from them and just pushes it to the organizations they bought it from.
Now, they can just focus on their application and their users which is what they always wanted to focus on and not have to focus on the infrastructure and keeping the infrastructure running.
So converged infrastructure has really taken off very, very quickly with our customers. I would say even faster than I would have expected. So it's either converged -- they're buying servers and storage and networking from one company, which both pre-installs it at a factory and maintains it long-term -- or hyper-converged, where all of the server and storage and networking is actually done in software on industry-standard hardware.
For private cloud, a large majority of our customers are going with converged for the pieces that are not going to public cloud.
Gardner: So 90 percent; that’s pretty impressive. I'm curious if that’s the rate of adoption for converged, what sort of rate of adoption are you seeing on the hyper-converged side where it’s as you say software-defined throughout?
Looking at hyper-converged
Faucher: It’s interesting. All of our customers are looking at hyper-converged right now to figure out where it is it fits for them. The thing about hyper-converged, where it’s just industry standard servers that I'm virtualizing for my servers and storage and networking, is where does hyper-converged fit? Sometimes, it definitely has a much lower entry point. So they'll look at it and say, "Is that right for my tier-1 data center? Maybe I need something that starts bigger and scales bigger in my tier-1 data center."
Hyper-converged may be a better fit for tier-2 data centers, or possibly in remote locations. Maybe in doctor's offices or my remote retail branches, they go with hyper-converged, which is a smaller unit, but also very easy to support, which is great for those remote locations.
You also have to think that hyper-converged, although very easy to procure and deploy, when you grow it, you only grow it in one size block. It’s like this block that can run 200 virtual machines, but when I add, I have to add 200 at a time, versus a smaller granularity.
So it’s important to make the correct decision. We spend a lot of time with our customers helping them figure out the right strategy. If we've decided that converged is right, is it converged or is it hyper-converged for the application? Now, as I said, it typically breaks down to for those tier 1 data centers it’s converged, but for those tier 2 data centers or those remote locations, it’s more likely hyper-converged.
Gardner: Again, putting on your enterprise architect hat, given that we have many times unpredictable loads on that volume and even velocity for big data, is there an added value, a benefit, of going converged and perhaps ultimately hyper-converged in terms of adapting to demand or being fit for purpose, trying to anticipate growth, but not have to put too much capital upfront and perhaps miss where the hockey puck is going to be type of thinking?
What is it about converged and hyper-converged that allow us to adapt to the IoT trend in healthcare, in retail, where traditional architecture, traditional siloed approaches would maybe handicap us?
Faucher: For some of these workloads, we just don’t know how they're going to scale or how quickly. We see that specifically with new applications. Maybe we're trying a new channel, possibly a new retail channel, and we don’t know how it’s going to scale. Of course, we don’t want to fail by not scaling high enough and turning our customers away.
But some of the vendors that provide cloud, hyper-converged and converged, have come up with some great solutions for rapid scalability. A successful solution for our customers has been something called flexible capacity. That’s where you've decided to go private cloud instead of public for some good reasons, but you wish that your private cloud could scale as rapidly as the public cloud, and also that your payments for your private cloud could scale just like a public cloud could.
Typically, when customers purchase for a private cloud, they're doing a traditional capital expense. So they just spend the money when they have it, and maybe in three or five years they spend more. Or they do a lease payment and they have a certain lease payment every month.
With flexible capacity, I can have more installed in my private cloud than I'm paying for. Let’s say, there is 100 percent there, but I'm only paying for 80 percent. That way, if there's an unexpected demand for whatever reason, I can turn on another 5, 10, 15, or 20 percent immediately without having to issue a PO first, which might takes 60 days in my organization, then place the order, wait 30 days for more to show up, and then meet the demand.
Now I can have more on site than I'm paying for, and when I need it I just turn it on and I pay a bill, just like I would if I were running in the public cloud. That’s what is called flexible capacity.
Another options is the ability to do cloud bursting. Let’s say I'm okay with public cloud for certain application workloads -- IaaS, for example -- but what I found is that I have a very efficient private cloud and I can actually run much more cost-effectively in my private cloud than I can in public, but I'm okay with public cloud in certain situations.
Well, if a burst comes, I can actually extend my application beyond private to public to take on this new workload. Then, I can place an order to expand my private cloud andwait for the new backing equipment to show up. That takes maybe 30 days. When it shows up, I set it up, I expand my on-site capability and then I just turn off the public cloud.
The most expensive use of public cloud many times is just turning it on and never turning it off. It’s really most cost-effective for short-term utilization, whether it’s new applications or development or disaster recovery (DR). Those are the most cost-effective fuses of public cloud.
Gardner: As a data scientist, you're probably more concerned with what the systems are doing and how they are doing it, but is there a benefit from your perspective of going with converged infrastructure or hyper-converged infrastructure solutions? Whether it’s bursting or reacting to a market demand within your organization, what is it about converged infrastructure that’s attractive for you as a data scientist?
Narayanan: One of the biggest challenges would be to have a system that will allow an organization to go to market soonest. With the big-data platform, there are lots of moving parts in terms of network. In a traditional Hadoop technology, there are like three copies of data, and you need to scale that across various systems so that you have high availability. Big-data organizations that are engaging big data are looking at high availability as one of the key requirements, which means that anytime a node goes down, you need to have the data available for analysis and query.
From a data scientist standpoint, stability or the availability of data is a key requirement. The data scientists, when they build your models and analytic views, are churning through tons and tons of data, and it requires tremendous system horsepower and also network capabilities that pulls data from various sources.
With the converged infrastructure, you get that advantage. Everything is in a single box. You have it just out there, and it is very scalable. For a data scientist, it’s like a dream come true for the analytic needs.
Gardner: I'm afraid we are coming up towards the end of our time. Let’s look at metrics of success. How do you know you are doing this well? Do you have any examples, Dennis or Raajan, of organizations that have thought about the platform, the right relationship between commercial and open source, that have examined their options on deployment models, including converged and hyper-converged, and what is it that they get back? How would you know that you are doing this right? Any thoughts about these business or technology metrics of success?
Faucher: I have a quick one that I see all the time. Our customers today measure how long it takes to get a new business application out the door. Almost every one of our customers has a measurement around that. How quickly can we get a business application out the door and functional, so that we can act upon it?
Most of the time it can be three months or six months, yet they really want to get these new applications out the door in a week, just constant improvement to their applications to help either their patients or to help their customers out or get into new channels.
What we're finding is they already have a metric that says, today it takes us three months to get a new application out the door. Let’s change that. Let’s really look at the way we are doing things -- people, process and IT end-to-end -- typically where they are helped through something like an Advizer, and let’s look at all the pieces of the process, look at it all from an ITIL standpoint or an ITSM standpoint and ask how can we improve the process.
And then let’s implement the solution and measure it. Let’s have constant improvement to take that three months down to one month, and down to possibly one week, if it’s a standardized enough application.
So for me, from a business standpoint, it’s the fastest time to value for new applications, new research, how quickly can I get those out the door better than I am doing today.
Narayanan: From a technical standpoint Dana, it’s how much data I can aggregate at the fastest. There are tons of data sources out there. The biggest challenge would be to integrate all that in the fastest amount of time and make sure that value is realized at the soonest. With the given platform, any platform that allows for that would definitely serve the purpose for the analytic needs.
Gardner: Listening to you both, it almost sounds as if you're taking what you can do with big data analytics and applying it to how you do big data analytics, is there some of that going on?
Faucher: Absolutely. It’s interesting, when we go out and meet with customers, when we do workshops and gather data from our customers, even when we do Advizers and we capture data from our customers, we use that. We take all identifying customer information out of it, but we use that to help our customers by saying that of the 2,000 customers that we do business with every year, this is what we are seeing. With these other customers, this is where we have seen them be successful, and we use that data to be able to help our customers be more successful faster.
Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Read a full transcript or download a copy. Sponsor: HP.
You may also be interested in:
- The future of business intelligence as a service with GoodData and HP Vertica
- Enterprises opting for converged infrastructure as stepping stone to hybrid cloud
- HP pursues big data opportunity with updated products, services, developer program
- How eCommerce sites harvest big data across multiple clouds
- How Localytics uses big data to improve mobile app development and marketing
- HP hyper-converged appliance delivers speedy VDI and apps deployment and a direct onramp to hybrid cloud
- Full 360 takes big data analysis cloud services to new business heights
- HP hyper-converged appliance delivers speedy VDI and apps deployment and a direct onramp to hybrid cloud
- GoodData analytics developers on what they look for in a big data platform
- How big data technologies Hadoop and Vertica drive business results at Snagajob
- Zynga builds big data innovation culture by making analytics open to all developers
- How big data powers GameStop to gain retail advantage and deep insights into its markets
- Data-driven apps performance monitoring spurs broad business benefits for Swiss insurer and Turkish mobile carrier
- How Malaysia’s Bank Simpanan Nasional implemented a sweeping enterprise content management system
- How Globe Testing helps startups make the leap to cloud- and mobile-first development