Monday, January 8, 2018

How a large Missouri medical center developed an agile healthcare infrastructure security strategy

Healthcare provider organizations are among the most challenging environments to develop and implement comprehensive and agile security infrastructures.

These providers of healthcare are usually sprawling campuses with large ecosystems of practitioners, suppliers, and patient-facing facilities. They also operate under stringent compliance requirements, with data privacy as a top priority.

At the same time, large hospitals and their extended communities are seeking to become more patient outcome-focused as they deliver ease-of-use, the best applications, as well as up-to-date data analysis to their staffs and physicians.

The next BriefingsDirect security insights discussion examines how a large Missouri medical center developed a comprehensive healthcare infrastructure security strategy from the edge to the data center -- and everything in between.

Listen to the podcast. Find it on iTunes. Get the mobile app. Read a full transcript or download a copy.

To learn how healthcare security can become more standardized and proactive with unified management and lower total costs, BriefingsDirect sat down with Phillip Yarbro, Network and Systems Engineer at Saint Francis Healthcare System in Cape Girardeau, Missouri. The discussion was moderated by Dana Gardner, principal analyst at Interarbor Solutions.

Here are some excerpts:

Gardner: When it comes to security nowadays, Phil, there’s a lot less chunking it out, of focusing on just devices or networks separately or on data centers alone. It seems that security needs to be deployed holistically -- or at least strategically – with standardized solutions, focused on across-the-board levels of coverage.

Tell us how you’ve been able to elevate security to that strategic level at Saint Francis Healthcare System. 

Yarbro
Yarbro: As a healthcare organization, we have a wide variety of systems -- from our electronic medical records (EMR) that we are currently using, to our 10-plus legacy EMRs, our home health system, payroll time and attendance. Like you said, that’s a wide variety of systems to keep up-to-date with antivirus solutions, making sure all of those are secure, especially with them being virtualized. All of those systems require a bunch of different exclusions and whatnot.

With our previous EMR, it was really hard to get those exclusions working and to minimize false positives. Over the past several years, security demands have increased. There are a lot more PCs and servers in the environment. There are a lot more threats taking place in healthcare systems, some targeting protected health information (PHI) or financial data, and we needed a solution that would protect a wide variety of endpoints; something that we could keep up-to-date extremely easily, and that would cover a wide variety of systems and devices.

Gardner: It seems like they’re adding more risk to this all the time, so it’s not just a matter of patching and keeping up. You need to be proactive, whenever possible.
 Being proactive is definitely key. We like to control applications to keep our systems even more secure, rather than just focusing on real-time threats.

Yarbro: Yes, being proactive is definitely key. Some of the features that we like about our latest systems are that you can control applications, and we’re looking at doing that to keep our systems even more secure, rather than just focusing on real-time threats, and things like that.

Gardner: Before we learn more about your security journey, tell us about Saint Francis Healthcare System, the size of organization and also the size of your IT department.

Yarbro: Saint Francis is between St. Louis and Memphis. It’s the largest hospital between the two cities. It’s a medium-sized hospital with 308 beds. We have a Level III neonatal intensive care unit (NICU) and a Level III trauma center. We see and treat more than 700,000 people within a five-state area.

With all of those beds, we have about 3,000 total staff, including referring physicians, contractors, and things like that. The IT help desk support, infrastructure team, and networking team amounts to about 30 people who support the entire infrastructure.

Gardner: Tell us about your IT infrastructure. To what degree are you using thin clients and virtual desktop infrastructure (VDI)? How many servers? Perhaps a rundown of your infrastructure in total?

Yarbro: We have about 2,500 desktops, all of which are Microsoft Windows desktops. Currently, they are all supplied by our organization, but we are looking at implementing a bring-your-own-device (BYOD) policy soon. Most of our servers are virtualized now. We do have a few physical ones left, but we have around 550 to 600 servers.

Of those servers, we support about 60 Epic servers and close to 75 Citrix servers. On the VDI side, we are using VMware Horizon View, and we are supporting about 2,100 virtual desktop sessions.

Gardner: Data center-level security is obviously very important for you. This isn’t just dealing with the edge and devices.

Virtual growth

Yarbro: Correct, yes. As technology increases, we’re utilizing our virtual desktops more and more. The data center virtualization security is going to be a lot more important going forward because that number is just going to keep growing.

Gardner: Let’s go back to your security journey. Over the past several years, requirements have gone up, scale has gone up, complexities have gone up. What did you look for when you wanted to get more of that strategic-level security approach? Tell us about your process for picking and choosing the right solutions.

Yarbro: A couple of lessons that we learned from our previous suppliers is that when we were looking for a new security solution we wanted something that wouldn’t make us experience scan storms. Our previous system didn’t have the capability to spread out our virus scans, and as a result whenever the staff would come in, in the morning and evenings, users were negatively affected by latency because of the scans. Our virtual servers all scanned at the same time.
We have a wide variety of systems and applications. Epic is our main EMR, but we also have 10 legacy EMRs, a picture archiving and communication system (PACS), rehab, home health, payroll, as well as time and attendance apps.
So whenever those were set to scan, our network just dragged to a halt.

We were looking for a new solution that didn’t have a huge impact on our virtual environment. We have a wide variety of systems and applications. Epic is our main EMR, but we also have 10 legacy EMRs, a picture archiving and communication system (PACS), rehab, home health, payroll, as well as time and attendance apps. There are a wide variety of systems that all have different exclusions and require different security processes. So we were hoping that our new solution would minimize false positives.

Since we are healthcare organization, there is PHI and there is sensitive financial data. We needed a solution that was Health Insurance Portability and Accountability Act (HIPAA)-compliant as well as Payment Card Industry Data Security Standard (PCI DSS)-compliant. We wanted a system that made a really good complement and that made it easy to manage everything.

Our previous ones, we were using Trend Micro in conjunction with Malwarebytes, were in two consoles. A lot of the time it was hard to get the exclusions to apply down to the devices when it came time to upgrade the clients. We had to spend time upgrading clients twice. It didn’t always work right. It was a very disruptive do-it-yourself operation, requiring a lot of resources on the back end. We were just looking for something that was much easier to manage.

Defend and prevent attacks

Gardner: Were any of the recent security breaches or malware infections something that tripped you up? I know that ransomware attacks have been on people’s minds lately.

It's been a great peace-of-mind benefit for our leadership to hear from Bitdefender that we were already protected (from ransomware attacks).
Yarbro: With the WannaCry and Petya attacks, we actually received a proactive e-mail from Bitdefender saying that we were protected. The most recent one, the Bad Rabbit, came in the next day and Bitdefender had already said that we were good for that one as well. It’s been a great peace-of-mind benefit for our leadership here knowing that we weren’t affected, that we were already protected whenever such news made its way to them in the morning.

Gardner: You mentioned Bitdefender. Tell me about how you switched, when, and what’s that gotten for you at Saint Francis?

Yarbro: After we evaluated Bitdefender, we worked really closely with their architects to make sure that we followed best practices and had everything set up, because we wanted to get our current solutions out of there as fast as possible.

For a lot of our systems we have test servers for testing computers. We were able to push Bitdefender out within minutes of having the consoles set up to these devices. After we received some exclusion lists, or were able to test on those, we made sure that Bitdefender didn’t catch or flag anything.

We were able to deploy Bitdefender on 2,200 PCs, all of our virtual desktops and VDI, and roughly 425 servers between May and July with minimal downtime, knowing that the downtime we had was simply to reboot the servers after we uninstalled our previous antivirus software.

We recently upgraded the remaining 150 or so servers, which we don’t have test systems for. They were all of our critical servers that couldn’t go down, such as our backup systems. We were able to push Bitdefender out to all of those within a week, again, without any downtime, and straight from the console.

Gardner: Tell us about that management capability. It’s good to have one screen, of course, but depth and breadth are also important. Has there been any qualitative improvement, in addition to the consolidation improvement?

Yarbro: Yes. Within the Bitdefender console, with our various servers, we have different policies in place, and now we can get very granular with it. The stuff that takes up a lot of resources we have it set to scan, maybe every other day instead of every day, but you can also block off servers.

Bitdefender also has a firewall option that we are looking at implementing soon, where you can group servers together as well as open the same firewall roles, and things like that. It just helps give us great visibility into making sure our servers and data center are protected and secured.

Gardner: You mentioned that some of the ransomware attacks recently didn’t cause you difficulty. Are there any other measurements that you use in order to qualify or quantify how good your security is? What did you find improved with your use of Bitdefender GravityZone?

It reduced our time to add new exclusions to our policies. That used to take us about 60 minutes. It's down to five minutes. That's a huge timesaving.
Yarbro: It reduced our time to add new exclusions to our policies. That used to take us about 60 minutes to do because we had to login to both consoles, do it, and make sure it got pushed out. That’s down to five minutes for us. So that’s a huge timesavings.

From the security administration side, by going into the console and making sure that everything is still reporting, that everything still looks good, making sure there haven’t been any viruses on any machines -- that process went down from 2.5 to three hours a week to less than 15 minutes.

GravityZone has a good reporting setup. I actually have a schedule set every morning to give me the malware activity and phishing activity from the day before. I don’t even have to go into the console to look at all that data. I get a nice e-mail in the morning and I can just visually see what happened.

At the end of the month we also have a reports setup that tells us the 10 highest endpoints that were infected with malware, and we can be proactive and go out and either re-educate our staff if it’s happening with a certain person. Not only from the security administration time has it saved us, it also helps us with security-related trouble calls. I would say that they have probably dropped at least 10 percent to 15 percent on those since we rolled out Bitdefender hospital-wide.

Gardner: Of course, you also want to make sure your end-users are seeing improvement. How about the performance degradation and false positives? Have you heard back from the field? Or maybe not, and that’s the proof?

User-friendly performance

Yarbro: You said it best right there. We haven’t heard anything from end-users. They don’t even know it’s there. With this type of roll out, no news is good news. They didn’t even notice the transition except an increase in performance. But otherwise they didn’t even know that anything was there, and the false positives haven’t been there.

We have our exclusion policy set, and it really hasn’t given us any headaches. It has helped our physicians quite a bit because they need uninterrupted access to medical information. They used to have to call whenever our endpoints lost their exclusion list and their software was getting flagged. It was very frustrating for them. They must be able to get into our EMR systems and log that information as quickly as possible. With Bitdefender, they haven’t had to call IT or anything like that, and it’s just helped them greatly.

Gardner: Back to our high-level discussion about going strategic with security, do you feel that using GravityZone and other Bitdefender technologies and solutions have been able to help you elevate your security to being comprehensive, deep, and something that’s more holistic?

Multilayered, speedier security

Yarbro: Yes, definitely. We did not have this level of control with our old systems. First of all, we didn’t have antivirus on all of our servers because it impacted them so negatively. Some of our more critical servers didn’t even have protection.

Just having our entire environment at 100 percent coverage has made us a lot more secure. The extra features that Bitdefender offers -- not just the antivirus piece but also the application blocking, device control, and firewall roles control just adds another level of security that we didn’t even dream about with our old solutions.

Gardner: How about the network in the data center? Is that something that you’ve been able to better applying policies and rules to in ways that you hadn’t before?

Yarbro: Yes, now with Bitdefender there is an option to offload scanning to a security server. We decided at first not to go with that solution because when we installed Bitdefender on our VDI endpoints, we didn’t see any increased CPU or memory utilization across any of our hosts, which is a complete 180-degrees from what we had before.

But for some of our other servers, servers in our DMZ, we are thinking about using the security server approach to offload all of the scanning. It will further increase performance across our virtualized server environment.

Gardner: From an economic standpoint, that also gives you more runway, so to speak, in terms of having to upgrade the hardware. You are going to get more bang for your buck in your infrastructure investments.
With servers-level security, it doesn't have to send that file back or check it again -- it already knows. That just speeds things up, almost exponentially.

Yarbro: Yes, exactly. And with that servers-level security, it’s beneficial to note that if there’s ever an upgrade for software or patches, that once a server checks into it first, if another server checks in or another desktop checks in, it already has that exclusion. It doesn’t have to send that file back or check it again -- it already knows. So it just speeds things up, almost exponentially, on those other devices.

Gardner: Just a more intelligent way to go about it, I would think.

Yarbro: Yes.

Gardner: Have you been looking to some of the other Bitdefender technologies? Where do you go next in terms of expanding your horizon on security?

One single pane of secure glass

Yarbro: The extra Bitdefender components that we’re kind of testing right now are device control and firewall, of being able to make sure that only devices that we allow can be hooked up, say via USB ports. That’s critical in our environment. We don’t want someone to come in here with a flash drive and install or upload a virus or anything along those lines.

The application and website blacklisting is also something that’s coming in the near future. We want to make sure that no malware, if it happens, can get past. We are also looking to consolidate two more management systems into just our Bitdefender console. That would be for encryption and patch management.

Bitdefender can do encryption as well, so we can just roll our current third-party software into Bitdefender. It will give us one pane of glass to manage all of these security features. In addition to patch management, we are using two different systems; one for servers, one for Windows endpoints. If we can consolidate that all into Bitdefender, because those policies are already in there, it would just be a lot of easier to manage and make us a lot more secure.

Gardner: Anything in terms of advice for others who are transitioning off of other security solutions? What would you advise people to do as they are going about a change from one security infrastructure to another?

Slow and steady saves the servers

Yarbro: That’s a good question. Make sure that you have all of your exclusion lists set properly. Bitdefender already in the console has Windows, VMware’s and Citrix’s best practices in their policies.

You only have to worry about your own applications, as long as you structure it properly from the beginning. Bitdefender’s engineers helped us with quite a bit. Just go slow and steady. From May to July last year we were able to do 425 servers. We probably could have done more than that, but we didn’t want to risk breaking something. Luckily, we didn’t push it to those more critical servers because we did change a few of our policy settings that probably would have broken a few of those and had us down for a while if we had put it all in right away.

Listen to the podcast. Find it on iTunes. Get the mobile app. Read a full transcript or download a copy. Sponsor: Bitdefender.

You may also be interested in:

Tuesday, November 21, 2017

Inside story on HPC's role in the Bridges Research Project at Pittsburgh Supercomputing Center

The next BriefingsDirect Voice of the Customer high-performance computing (HPC) success story interview examines how Pittsburgh Supercomputing Center (PSC) has developed a research computing capability, Bridges, and how that's providing new levels of analytics, insights, and efficiencies.

We'll now learn how advances in IT infrastructure and memory-driven architectures are combining to meet the new requirements for artificial intelligence (AI), big data analytics, and deep machine learning.

Listen to the podcast. Find it on iTunes. Get the mobile app. Read a full transcript or download a copy.

Here to describe the inside story on building AI Bridges are Dr. Nick Nystrom, Interim Director of Research, and Paola Buitrago, Director of AI and Big Data, both at Pittsburgh Supercomputing Center. The discussion is moderated by Dana Gardner, principal analyst, at Interarbor Solutions.

Here are some excerpts:


Gardner: Let's begin with what makes Bridges unique. What is it about Bridges that is possible now that wasn't possible a year or two ago?

Nystrom
Nystrom: Bridges allows people who have never used HPC before to use it for the first time. These are people in business, social sciences, different kinds of biology and other physical sciences, and people who are applying machine learning to traditional fields. They're using the same languages and frameworks that they've been using on their laptops and now that is scaling up to a supercomputer. They are bringing big data and AI together in ways that they just haven't done before.

Gardner: It almost sounds like the democratization of HPC. Is that one way to think about it?

Nystrom: It very much is. We have users who are applying tools like R and Python and scaling them up to very large memory -- up to 12 terabytes of random access memory (RAM) -- and that enables them to gain answers to problems they've never been able to answer before.

Gardner: There is a user experience aspect, but I have to imagine there are also underlying infrastructure improvements that also contribute to user democratization.
We stay in touch with the user community and we look at this from their perspective. What are the applications that they need to run? What we came up with is a very heterogeneous system.

Nystrom: Yes, democratization comes from two things. First, we stay closely in touch with the user community and we look at this opportunity from their perspective first. What are the applications that they need to run? What do they need to do? And from there, we began to work with hardware vendors to understand what we had to build, and, what we came up with is a very heterogeneous system.

We have three tiers of nodes having memories ranging from 128 gigabytes to 3 terabytes, to 12 terabytes of RAM. That's all coupled on the same very-high-performance fabric. We were the first installation in the world with the Intel Omni-Path interconnect, and we designed that in a custom topology that we developed at PSC expressly to make big data available as a service to all of the compute nodes with equally high bandwidth, low latency, and to let these new things become possible.

Gardner: What other big data analytics benefits have you gained from this platform?

Buitrago
Buitrago: A platform like Bridges enables that which was not available before. There's a use case that was recently described by Tuomas Sandholm, [Professor and Director of the Electronic Marketplaces Lab at Carnegie Mellon University. It involves strategic machine learning using Bridges HPC to play and win at Heads-Up, No-limit Texas Hold'em poker as a capabilities benchmark.]

This is a perfect example of something that could not have been done without a supercomputer. A supercomputer enables massive and complex models that can actually give an accurate answer.

Right now, we are collecting a lot of data. There's a convergence of having great capabilities right in the compute and storage -- and also having the big data to answer really important questions. Having a system like Bridges allows us to, for example, analyze all that there is on the Internet, and put the right pieces together to answer big societal or healthcare-related questions.

Explore the New Path to
Computing

Gardner: The Bridges platform has been operating for some months now. Tell us some other examples or use cases that demonstrate its potential.

Dissecting disease through data

Nystrom: Paola mentioned use cases for healthcare. One example is a National Institutes of Health (NIH) Center of Excellence in the Big Data to Knowledge program called the Center for Causal Discovery.

They are using Bridges to combine very large data in genomics, such as lung-imaging data and brain magnetic resonance imaging (MRI) data, to come up with real cause-and-effect relationships among those very large data sets. That was never possible before because the algorithms were not scaled. Such scaling is now possible thanks very large memory architectures and because the data is available.

At CMU and the University of Pittsburgh, we have those resources now and people are making discoveries that will improve health. There are many others. One of these is on the Common Crawl data set, which is a very large web-scale data set that Paola has been working with.

Buitrago: Common Crawl is a data set that collects all the information on the Internet. The data is currently available on the Amazon Web Services (AWS) cloud in S3. They host these data sets for free. But, if you want to actually analyze the data, to search or create any index, you have to use their computing capabilities, which is a good option. However, given the scale and the size of the data, this is something that requires a huge investment.

So we are working on actually offering the same data set, putting it together with the computing capabilities of Bridges. This would allow the academic community at large to do such things as build natural language processing models, or better analyze the data -- and they can do it fast, and they can do it free of charge. So that's an important example of what we are doing and how we want to support big data as a whole.

Explore the New Path to
Computing Solutions

Gardner: So far we’ve spoken about technical requirements in HPC, but economics plays a role here. Many times we've seen in the evolution of technology that as things become commercially available off-the-shelf technologies, they can be deployed in new ways that just weren’t economically feasible before. Is there an economics story here to Bridges?

Low-cost access to research

Nystrom: Yes, with Bridges we have designed the system to be extremely cost-effective. That's part of why we designed the interconnect topology the way we did. It was the most cost-effective way to build that for the size of data analytics we had to do on Bridges. That is a win that has been emulated in other places.

So, what we offer is available to research communities at no charge -- and that's for anyone doing open research. It's also available to the industrial sector at essentially a very attractive rate because it’s a cost-recovery rate. So, we do work with the private sector. We are looking to do even more of that in future.

We're always looking at the best available technology for performance, for price, and then architecting that into a solution that will serve research.
Also, the future systems we are looking at will leverage lots of developing technologies. We're always looking at the best available technology for performance, for price, and then architecting that into a solution that will serve research.

Gardner: We’ve heard a lot recently from Hewlett Packard Enterprise (HPE) recently about their advances in large-scale memory processing and memory-driven architectures. How does that fit into your plans?

Nystrom: Large, memory-intensive architectures are a cornerstone of Bridges. We're doing a tremendous amount of large-scale genome sequence assembly on Bridges. That's individual genomes, and it’s also metagenomes with important applications such as looking at the gut microbiome of diabetic patients versus normal patients -- and understanding how the different bacteria are affected by and may affect the progression of diabetes. That has tremendous medical implications. We’ve been following memory technology for a very long time, and we’ve also been following various kinds of accelerators for AI and deep learning.

Gardner: Can you tell us about the underlying platforms that support Bridges that are currently commercially available? What might be coming next in terms of HPE Gen10 servers, for example, or with other HPE advances in the efficiency and cost reduction in storage? What are you using now and what do you expect to be using in the future?

Ever-expanding memory, storage

Nystrom: First of all, I think the acquisition of SGI by HPE was very strategic. Prior to Bridges, we had a system called Blacklight, which was the world’s largest shared-memory resource. It’s what taught us, and we learned how productive that can be for new communities in terms of human productivity. We can’t scale smart humans, and so that’s essential.

In terms of storage, there are tremendous opportunities now for integrating storage-class memory, increasing degrees of flash solid-state drives (SSDs), and other stages. We’ve always architected our own storage systems, but now we are working with HPE to think about what we might do for our next round of this.

Gardner: For those out there listening and reading this information, if they hadn’t thought that HPC and big data analytics had a role in their businesses, why should they think otherwise?

Nystrom: From my perspective, AI is permeating all aspects of computing. The way we see AI as important in an HPC machine is that it is being applied to applications that were traditionally HPC only -- things like weather and protein folding. Those were apps that people used to run on just big iron.

These will be enterprise workloads where AI has a key impact. They will use AI as an empowering tool to make what they already do, better.
Now, they are integrating AI to help them find rare events, to do longer-term simulations in less time. And they’ll be doing this across other industries as well. These will be enterprise workloads where AI has a key impact. It won’t necessarily turn companies into AI companies, but they will use AI as an empowering tool to make what they already do, better.

Gardner: An example, Nick?

Nystrom: A good example of the way AI is permeating other fields is what people are doing at the Institute for Precision Medicine, [a joint effort between the University of Pittsburgh and the University of Pittsburgh Medical Center], and the Carnegie Mellon University Machine Learning and Computational Biology Departments.

They are working together on a project called Big Data for Better Health. Their objective is to apply state of the art machine learning techniques, including deep learning, to integrated genomic patient medical records, imaging data, and other things, and to really move toward realizing true personalized medicine.

Gardner: We’ve also heard a lot recently about hybrid IT. Traditionally HPC required an on-premises approach. Now, to what degree does HPC-as-a-service make sense in order to take advantage of various cloud models?

Explore the New Path to
Computing

Nystrom: That’s a very good question. One of the things that Bridges makes available through the democratizing of HPC is big data-as-a-service and HPC-as-a-service. And it does that in many cases by what we call gateways. These are web portals for specific domains.

At the Center for Causal Discovery, which I mentioned, they have the Causal Web. It’s a portal, it can run in any browser, and it lets people who are not experts with supercomputers access Bridges without even knowing they are doing it. They run applications with a supercomputer as the back-end.

Another example is Galaxy Project and Community Hub, which are primarily for bioinformatic workflows, but also other things. The main Galaxy instance is hosted elsewhere, but people can run very large memory genome assemblies on Bridges transparently -- again without even knowing. They don’t have to log in, they don’t have to understand Linux; they just run it through a web browser, and they can use HPC-as-a-service. It becomes very cloud-like at that point.

Super-cloud supercomputing

Cloud and traditional HPC are complimentary among different use cases, for what's called for in different environments and across different solutions.
Buitrago: Depending on the use case, an environment like the cloud can make sense. HPC can be used for an initial stage, if you want to explore different AI models, for example. You can fine-tune your AI and benefit from having the data close. You can reduce the time to start by having a supercomputer available for only a week or two. You can find the right parameters, you get the model, and then when you are actually generating inferences you can go to the cloud and scale there. It supports high peaks in user demand. So, cloud and traditional HPC are complimentary among different use cases, for what’s called for in different environments and across different solutions.

Gardner: Before we sign off, a quick look to the future. Bridges has been here for over a year, let's look to a year out. What do you expect to come next?

Nystrom: Bridges has been a great success. It's very heavily subscribed, fully subscribed, in fact. It seems to work; people like it. So we are looking to build on that. We're looking to extend that to a much more powerful engine where we’ve taken all of the lessons we've learned improving Bridges. We’d like to extend that by orders of magnitude, to deliver a lot more capability -- and that would be across both the research community and industry.

Gardner: And using cloud models, what should look for in the future when it comes to a richer portfolio of big data-as-a-service offerings?

Buitrago: We are currently working on a project to make data more available to the general public and to researchers. We are trying to democratize data and let people do searches and inquiries and processing that they wouldn’t be able to do without us.

We are integrating big data sets that go from web crawls to genomic data. We want to offer them paired with the tools to properly process them. And we want to provide this to people who haven’t done this in the past, so they can explore their questions and try to answer them. That’s something we are really interested in and we look forward to moving into a production stage.


Listen to the podcast. Find it on iTunes. Get the mobile app. Read a full transcript or download a copy. Sponsor: Hewlett Packard Enterprise.

You may also be interested in:

Monday, November 20, 2017

How UBC gained TCO advantage via flash for its EduCloud cloud storage service

The next BriefingsDirect cloud efficiency case study explores how a storage-as-a-service offering in a university setting gains performance and lower total cost benefits by a move to all-flash storage.

We’ll now learn how the University of British Columbia (UBC) has modernized its EduCloud storage service and attained both efficiency as well as better service levels for its diverse user base.

Listen to the podcast. Find it on iTunes. Get the mobile app. Read a full transcript or  download a copy.

Here to help us explore new breeds of SaaS solutions is Brent Dunington, System Architect at UBC in Vancouver. The discussion is moderated by Dana Gardner, Principal Analyst at Interarbor Solutions.

Here are some excerpts:

Gardner: How is satisfying the storage demands at a large and diverse university setting a challenge? Is there something about your users and the diverse nature of their needs that provides you with a complex requirements list? 

Dunington
Dunington: A university setting isn't much different than any other business. The demands are the same. UBC has about 65,000 students and about 15,000 staff. The students these days are younger kids, they all have iPhones and iPads, and they just want to push buttons and get instant results and instant gratification. And that boils down to the services that we offer.

We have to be able to offer those services, because as most people know, there are choices -- and they can go somewhere else and choose those other products.

Our team is a rather small team. There are 15 members in our team, so we have to be agile, we have to be able to automate things, and we need tools that can work and fulfill those needs. So it's just like any other business, even though it’s a university setting.

HPE
Flash Performance

Gardner: Can you give us a sense of the scale that describes your storage requirements?

Dunington: We do SaaS, we also do infrastructure-as-a-service (IaaS). EduCloud is a self-service IaaS product that we deliver to UBC, but we also deliver it to 25 other higher institutions in the Province of British Columbia.

We have been doing IaaS for five years, and we have been very, very successful. So more people are looking to us for guidance.

Because we are not just delivering to UBC, we have to be up running and always able to deliver, because each school has different requirements. At different times of the year -- because there is registration, there are exam times -- these things have to be up. You can’t not be functioning during an exam and have 600 students not able to take the tests that they have been studying for. So it impacts their life and we want to make sure that we are there and can provide the services for what they need.

Gardner: In order to maintain your service levels within those peak times, do you in your IaaS and storage services employ hybrid-cloud capabilities so that you can burst? Or are you doing this all through your own data center and your own private cloud?

On-Campus Cloud

Dunington: We do it all on-campus. British Columbia has a law that says all the data has to stay in Canada. It’s a data-sovereignty law, the data can't leave the borders.

That's why EduCloud has been so successful, in my opinion, because of that option. They can just go and throw things out in the private cloud.

The public cloud providers are providing more services in Canada: Amazon Web Services (AWS) and Microsoft Azure cloud are putting data centers in Canada, which is good and it gives people an option. Our team’s goal is to provide the services, whether it's a hybrid model or all on-campus. We just want to be able to fulfill those needs.

Gardner: It sounds like the best of all worlds. You are able to give that elasticity benefit, a lot of instant service requirements met for your consumers. But you are starting to use cloud pay-as-you-go types of models and get the benefit of the public cloud model -- but with the security, control and manageability of the private clouds.

What decisions have you made about your storage underpinnings, the infrastructure that supports your SaaS cloud?

Dunington: We have a large storage footprint. For our site, it’s about 12 petabytes of storage. We realized that we weren’t meeting the needs with spinning disks. One of the problems was that we had runaway virtual workloads that would cause problems, and they would impact other services. We needed some mechanism to fix that.

We wanted to make sure that we had the ability to attain quality of service levels and control those runaway virtual machines in our footprint.
We went through the whole request for proposal (RFP) process, and all the IT infrastructure vendors responded, but we did have some guidelines that we wanted to go through. One of the things we did is present our problems and make sure that they understood what the problems were and what they were trying to solve.

And there were some minimum requirements. We do have a backup vendor of choice that they needed to merge with. And quality of service is a big thing. We wanted to make sure that we had the ability to attain quality of service levels and control those runaway virtual machines in our footprint.

Gardner: You gained more than just flash benefits when you got to flash storage, right?

Streamlined, safe, flash storage

Dunington: Yes, for sure. With an entire data center full of spinning disks, it gets to the point where the disks start to manage you; you are no longer managing the disks. And the teams out there changing drives, removing volumes around it, it becomes unwieldy. I mean, the power, the footprint, and all that starts to grow.

Also, Vancouver is in a seismic zone, we are right up against the Pacific plate and it's a very active seismic area. Heaven forbid anything happens, but one of the requirements we had was to move the data center into the interior of the province. So that was what we did.

When we brought this new data center online, one of the decisions the team made was to move to an all-flash storage environment. We wanted to be sure that it made financial sense because it's publicly funded, and also improved the user experience, across the province.

Gardner: As you were going about your decision-making process, you had choices, what made you choose what you did? What were the deciding factors?

Dunington: There were a lot of deciding factors. There’s the technology, of being able to meet the performance and to manage the performance. One of the things was to lock down runaway virtual machines and to put performance tiers on others.

But it’s not just the technology; it's also the business part, too. The financial part had to make sense. When you are buying any storage platform, you are also buying the support team and the sales team that come with it.

Our team believes that technology is a certain piece of the pie, and the rest of it is relationship. If that relationship part doesn't work, it doesn’t matter how well the technology part works -- the whole thing is going to break down.

Because software is software, hardware is hardware -- it breaks, it has problems, there are limitations. And when you have to call someone, you have to depend on him or her. Even though you bought the best technology and got the best price -- if it doesn't work, it doesn’t work, and you need someone to call.

So those service and support issues were all wrapped up into the decision.

HPE
Flash Performance

We chose the Hewlett Packard Enterprise (HPE) 3PAR all-flash storage platform. We have been very happy with it. We knew the HPE team well. They came and worked with us on the server blade infrastructure, so we knew the team. The team knew how to support all of it. 

We also use the HPE OneView product for provisioning, and it integrated into that all. It also supported the performance optimization tool (IT Operations Management for HPE OneView) to let us set those values, because one of the things in EduCloud is customers choose their own storage tier, and we mark the price on it. So basically all we would do is present that new tier as new data storage within VMware and then they would just move their workloads across non-disruptively. So it has worked really well.

The 3PAR storage piece also integrates with VMware vRealize Operations Manager. We offer that to all our clients as a portal so they can see how everything is working and they can do their own diagnostics. Because that’s the one goal we have with EduCloud, it has to be self-service. We can let the customers do it, that's what they want.

Gardner: Not that long ago people had the idea that flash was always more expensive and that they would use it for just certain use-cases rather than pervasively. You have been talking in terms of a total cost of ownership reduction. So how does that work? How does the economics of this over a period of time, taking everything into consideration, benefit you all?

Economic sense at scale

Dunington: Our IT team and our management team are really good with that part. They were able to break it all down, and they found that this model would work at scale. I don’t know the numbers per se, but it made economic sense.

Spinning disks will still have a place in the data center. I don't know a year from now if an all-flash data center will make sense, because there are some records that people will throw in and never touch. But right now with the numbers on how we worked it out, it makes sense, because we are using the standard bronze, the gold, the silver tiers, and with the tiers it makes sense.

The 3PAR solution also has dedupe functionality and the compression that they just released. We are hoping to see how well that trends. Compression has only been around for a short period of time, so I can’t really say, but the dedupe has done really well for us.

Gardner: The technology overcomes some of the other baseline economic costs and issues, for sure.

We have talked about the technology and performance requirements. Have you been able to qualify how, from a user experience, this has been a benefit?

Dunington: The best benchmark is the adoption rate. People are using it, and there are no help desk tickets, so no one is complaining. People are using it, and we can see that everything is ramping up, and we are not getting tickets. No one is complaining about the price, the availability. Our operational team isn't complaining about it being harder to manage or that the backups aren’t working. That makes me happy.

The big picture

Gardner: Brent, maybe a word of advice to other organizations that are thinking about a similar move to private cloud SaaS. Now that you have done this, what might you advise them to do as they prepare for or evaluate a similar activity?

Not everybody needs that speed, not everybody needs that performance, but it is the future and things will move there.
Dunington: Look at the full picture, look at the total cost of ownership. There’s the buying of the hardware, and there's also supporting the hardware, too. Make sure that you understand your requirements and what your customers are looking for first before you go out and buy it. Not everybody needs that speed, not everybody needs that performance, but it is the future and things will move there. We will see in a couple of years how it went.

Look at the big picture, step back. It’s just not the new shiny toy, and you might have to take a stepped approach into buying, but for us it worked. I mean, it’s a solid platform, our team sleeps well at night, and I think our customers are really happy with it.

Gardner: This might be a little bit of a pun in the education field, but do your homework and you will benefit.

HPE
Flash Performance

Dunington: Yes, for sure.

Listen to the podcast. Find it on iTunes. Get the mobile app. Read a full transcript or  download a copy. Sponsor: Hewlett Packard Enterprise.

You may also be interested in: