Thursday, September 12, 2019

The venerable history of IT systems management meets the new era of AIOps-fueled automation over hybrid and multicloud complexity


The next edition of the BriefingsDirect Voice of the Innovator podcast series explores the latest developments in hybrid IT management.

IT operators have for decades been playing catch-up to managing their systems amid successive waves of heterogeneity, complexity, and changing deployment models. IT management technologies and methods have evolved right along with the challenge, culminating in the capability to optimize and automate workloads to exacting performance and cost requirements.

But now automation is about to give an AIOps boost from new machine learning (ML) and artificial intelligence (AI) capabilities -- just as multicloud and edge computing deployments become more common -- and demanding.

Listen to the podcast. Find it on iTunes. Read a full transcript or download a copy.

Stay with us as we explore the past, present, and future of IT management innovation with a 30-year veteran of IT management, Doug de Werd, Senior Product Manager for Infrastructure Management at Hewlett Packard Enterprise (HPE). The interview is conducted by Dana Gardner, Principal Analyst at Interarbor Solutions.

Here are some excerpts:

Gardner: Management in enterprise IT has for me been about taking heterogeneity and taming it, bringing varied and dynamic systems to a place where people can operate over more, using less. And that’s been a 30-year journey.

Yet heterogeneity these days, Doug, includes so much more than it used to. We’re not just talking about platforms and frameworks – we’re talking about hybrid cloud, multicloud, and many Software as a service (SaaS) applications. It includes working securely across organizational boundaries with partners and integrating business processes in ways that never have happened before.

With all of that new complexity, with an emphasis on intelligent automation, where do you see IT management going next?

Managing management 

de Werd
de Werd: Heterogeneity is known by another term, and that’s chaos. In trying to move from the traditional silos and tools to more agile, flexible things, IT management is all about your applications -- human resources and finance, for example – that run the core of your business. There’s also software development and other internal things. The models for those can be very different and trying to do that in a single manner is difficult because you have widely varying endpoints.

Gardner: Sounds like we are now about managing the management.

de Werd: Exactly. Trying to figure out how to do that in an efficient and economically feasible way is a big challenge.

Gardner: I have been watching the IT management space for 20-plus years and every time you think you get to the point where you have managed everything that needs to be managed -- something new comes along. It’s a continuous journey and process.

But now we are bringing intelligence and automation to the problem. Will we ever get to the point where management becomes subsumed or invisible?

de Werd: You can automate tasks, but you can’t automate people. And you can’t automate internal politics and budgets and things like that. What you do is automate to provide flexibility.
How to Support DevOps, Automation,
And IT Management Initiatives
But it’s not just the technology, it’s the economics and it’s the people. By putting that all together, it becomes a balancing act to make sure you have the right people in the right places in the right organizations. You can automate, but it’s still within the context of that broader picture.

Gardner: When it comes to IT management, you need a common framework. For HPE, HPE OneView has been core. Where does HPE OneView go from here? How should people think about the technology of management that also helps with those political and economic issues?

https://www.hpe.com/us/en/what-is/hybrid-it-management.html

de Werd: HPE OneView is just an outstanding core infrastructure management solution, but it’s kind of like a car. You can have a great engine, but you still have to have all the other pieces.

And so part of what we are trying to do with HPE OneView, and we have been very successful, is extending that capability out into other tools that people use. This can be into more traditional tools like with our Microsoft or VMware partnerships and exposing and bringing HPE OneView functionality into traditional things.
The integration allows the confidence of using HPE OneView as a core engine. All those other pieces can still be customized to do what you need to do -- yet you still have that underlying core foundation of HPE OneView.

But it also has a lot to do with DevOps and the continuous integration development types of things with Docker, Chef, and Puppet -- the whole slew of at least 30 partners we have.

That integration allows the confidence of using HPE OneView as a core engine. All those other pieces can still be customized to do what you need to do -- yet you still have that underlying core foundation of HPE OneView.

Gardner: And now with HPE increasingly going to an as-a-service orientation across many products, how does management-as-a-service work?

Creativity in the cloud 

de Werd: It’s an interesting question, because part of management in the traditional sense -- where you have a data center full of servers with fault management or break/fix such as a hard-drive failure detection – is you want to be close, you want to have that notification immediately.

As you start going up in the cloud with deployments, you have connectivity issues, you have latency issues, so it becomes a little bit trickier. When you have more up levels, up the stack, where you have software that can be more flexible -- you can do more coordination. Then the cloud makes a lot of sense.  
https://www.hpe.com/us/en/what-is/hybrid-it-management.html
Management in the cloud can mean a lot of things. If it’s the infrastructure, you tend to want to be closer to the infrastructure, but not exclusively. So, there’s a lot of room for creativity.

Gardner: Speaking of creativity, how do you see people innovating both within HPE and within your installed base of users? How do people innovate with management now that it’s both on- and off-premises? It seems to me that there is an awful lot you could do with management beyond red-light, green-light, and seek out those optimization and efficiency goals. Where is the innovation happening now with IT management?

de Werd: The foundation of it begins with automation, because if you can automate you become repeatable, consistent, and reliable, and those are all good in your data center.
Transform Compute, Storage, and Networking
Into Software-Defied Infrastructure
You can free up your IT staff to do other things. The truth is if you can do that reliably, you can spend more time innovating and looking at your problems from a different angle. You gain the confidence that the automation is giving you.

Automation drives creativity in a lot of different ways. You can be faster to market, have quicker releases, those types of things. I think automation is the key.

Gardner: Any examples? I know sometimes you can’t name customers, but can you think of instances where people are innovating with management in ways that would illustrate its potential?

Automation innovation 

de Werd: There’s a large biotech genome sequencing company, an IT group that is very innovative. They can change their configuration on the fly based on what they want to do. They can flex their capacity up and down based on a task -- how much compute and storage they need. They have a very flexible way of doing that. They have it all automated, all scripted. They can turn on a dime, even as a very large IT organization.


And they have had some pretty impressive ways of repurposing their IT. Today we are doing X and tonight we are doing Y. They can repurpose that literally in minutes -- versus days for traditional tasks.

Gardner: Are your customers also innovating in ways that allow them to get a common view across the entire lifecycle of IT? I’m thinking from requirements, through development, deployment, test, and continuous redeployment.

de Werd: Yes, they can string all of these processes together using different partner tools, yet at the core they use HPE OneView and HPE Synergy underneath the covers to provide that real, raw engine.
By using the HPE partner ecosystem integrated with HPE OneView, they have visibility. Then they can get into things like Docker Swarm. It may not be HPE OneView providing that total visibility. At the hardware level it is, but because we feed into upper-level apps they can adjust to meet the needs across the entire business process.

By using the HPE partner ecosystem integrated with HPE OneView, they have that visibility. Then they can get into things like Docker Swarm. It may not be HPE OneView providing that total visibility. At the hardware and infrastructure level it is, but because we are feeding into upper-level and broader applications, they can see what’s going on and determine how to adjust to meet the needs across the entire business process.

Gardner: In terms of HPE Synergy and composability, what’s the relationship between composability and IT management? Are people making the whole greater than the sum of the parts with those?

de Werd: They are trying to. I think there is still a learning curve. Traditional IT has been around a long time. It just takes a while to change the mentality, skills sets, and internal politics. It takes a while to get to that point of saying, “Yeah, this is a good way to go.”

But once they dip their toes into the water and see the benefits -- the power, flexibility, and ease of it -- they are like, “Wow, this is really good.” One step leads to the next and pretty soon they are well on their way on their composable journey.

Gardner: We now see more intelligence brought to management products. I am thinking about how HPE InfoSight is being extended across more storage and server products.
How to Eliminate Complex Manual Processes
And Increase Speed of IT Delivery
We used to access log feeds from different IT products and servers. Then we had agents and agent-less analysis for IT management. But now we have intelligence as a service, if you will, and new levels of insight. How will HPE OneView evolve with this new level of increasingly pervasive intelligence?

de Werd: HPE InfoSight is a great example. You see it being used in multiple ways, things like taking the human element out, things like customer advisories coming out and saying, “Such-and-such product has a problem,” and how that affects other products.

If you are sitting there looking at 1,000 or 5,000 servers in your data center, you’re wondering how I am affected by this? There are still a lot of manual spreadsheets out there, and you may find yourself pouring through a list.

https://www.hpe.com/us/en/integrated-systems/software.html

Today, you have the capability of getting an [intelligent alert] that says, “These are the ones that are affected. Here is what you should do. Do you want us to go fix it right now?” That’s just an example of what you can do.

It makes you more efficient. You begin to understand how you are using your resources, where your utilization is, and how you can then optimize that. Depending on how flexible you want to be, you can design your systems to respond to those inputs and automatically flex [deployments] to the places that you want to be.

This leads to autonomous computing. We are not quite there yet, but we are certainly going in that direction. You will be able to respond to different compute, storage, and network requirements and adjust on the fly. There will also be self-healing and self-morphing into a continuous optimization model.

Gardner: And, of course, that is a big challenge these days … hybrid cloud, hybrid IT, and deploying across on-premises cloud, public cloud, and multicloud models. People know where they want to go with that, but they don’t know how to get there.

How does modern IT management help them achieve what you’ve described across an increasingly hybrid environment?

Manage from the cloud down 

de Werd: They need to understand what their goals are first. Just running virtual machines (VMs) in the cloud isn’t really where they want to be. That was the initial thing. There are economic considerations involved in the cloud, CAPEX and OPEX arguments.

Simply moving your infrastructure from on-premises up into the cloud isn’t going to get you where you really need to be. You need to look at it from a cloud-native-application perspective, where you are using micro services, containers, and cloud-enabled programming languages -- your Javas and .NETs and all the other stateless types of things – all of which give you new flexibility to flex performance-wise.

From the management side, you have to look at different ways to do your development and different ways to do delivery. That’s where the management comes in. To do DevOps and exploit the DevOps tools, you have to flip the way you are thinking -- to go from the cloud down.

https://www.hpe.com/us/en/solutions/infosight.html

Cloud application development on-premises, that’s one of the great things about containers and cloud-native, stateless types of applications. There are no hardware dependencies, so you can develop the apps and services on-premises, and then run them in the cloud, run them on-premises, and/or use your hybrid cloud vendor’s capabilities to burst up into a cloud if you need it. That’s the joy of having those types of applications. They can run anywhere. They are not dependent on anything -- on any particular underlying operating system.

But you have to shift and get into that development mode. And the automation helps you get there, and then helps you respond quickly once you do.

Gardner: Now that hybrid deployment continuum extends to the edge. There will be increasing data analytics, measurement, and making deployment changes dynamically from that analysis at the edge.

It seems to me that the way you have designed and architected HPE IT management is ready-made for such extensibility out to the edge. You could have systems run there that can integrate as needed, when appropriate, with a core cloud. Tell me how management as you have architected it over the years helps manage the edge, too.
Businesses need to move their processing further out to the edge and gain the instant response, instant gratification. You can't wait to have an input analyzed by going all the way back to the cloud. You want the processing toward the edge to get that instantaneous response.

de Werd: Businesses need to move their processing further out to the edge, and gain the instant response, instant gratification. You can’t wait to have an input analyzed on the edge, to have it go all the way back to a data source or all the way up to a cloud. You want to have the processing further and further toward the edge so you can get that instantaneous response that customers are coming to expect.

But again, being able to automate how to do that, and having the flexibility to respond to differing workloads and moving those toward the edge, I think, is key to getting there.

Gardner: And Doug, for you, personally, do you have some takeaways from your years of experience about innovation and how to make innovation a part of your daily routine?

de Werd: One of the big impacts on the team that I work with is in our quality assurance (QA) testing. It’s a very complex thing to test various configurations; that’s a lot of work. In the old days, we had to manually reconfigure things. Now, as we use an Agile development process, testing is a continuous part of it.

We can now respond very quickly and keep up with the Agile process. It used to be that testing was always the tail-end and the longest thing. Development testing took forever. Now because we can automate that, it just makes that part of the process easier, and it has taken a lot of stress off of the teams. We are now much quicker and nimbler in responses, and it keeps people happy, too.
How to Get Simle, Automated Management
Of Your Hybrid Infrastructure
Gardner: As we close out, looking to the future, where do you see management going, particularly how to innovate using management techniques, tools, and processes? Where is the next big green light coming from?

Set higher goals 

de Werd: First, get your house in order in terms of taking advantage of the automation available today. Really think about how not to just use the technology as the end-state. It’s more of a means to get to where you want to be.

Define where your organization wants to be. Where you want to be can have a lot of different aspects; it could be about how the culture evolves, or what you want your customers’ experience to be. Look beyond just, “I want this or that feature.” 


Then, design your full IT and development processes. Get to that goal, rather than just saying, “Oh, I have 100 VMs running on a server, isn’t that great?” Well, if it’s not achieving the ultimate goal of what you want, it’s just a technology feat. Don’t use technology just for technology’s sake. Use it to get to the larger goals, and define those goals, and how you are going to get there.

Thursday, September 5, 2019

How the Catalyst UK program seeds the next generations of HPC, AI, and supercomputing

The next BriefingsDirect Voice of the Customer discussion explores a program to expand the variety of CPUs that support supercomputer and artificial intelligence (AI)-intensive workloads.

The Catalyst program in the UK is seeding the advancement of the ARM CPU architecture for high performance computing (HPC) as well as establishing a vibrant software ecosystem around it.

Listen to the podcast. Find it on iTunes. Read a full transcript or download a copy.

Stay with us to learn about unlocking new choices and innovation for the next generations of supercomputing with Dr. Eng Lim Goh, Vice President and Chief Technology Officer for HPC and AI at Hewlett Packard Enterprise (HPE), and Professor Mark Parsons, Director of the Edinburgh Parallel Computing Centre (EPCC) at the University of Edinburgh. The discussion is moderated by Dana Gardner, Principal Analyst at Interarbor Solutions.

Here are some excerpts:

Gardner: Mark, why is there a need now for more variety of CPU architectures for such use cases as HPC, AI, and supercomputing?

Parsons
Parsons: In some ways this discussion is a bit odd because we have had huge variety over the years in supercomputing with regard to processors. It’s really only the last five to eight years that we’ve ended up with the majority of supercomputers being built from the Intel x86 architecture.

It’s always good in supercomputing to be on the leading edge of technology and getting more variety in the processor is really important. It is interesting to seek different processor designs for better performance for AI or supercomputing workloads. We want the best type of processors for what we want to do today.

Gardner: What is the Catalyst program? Why did it come about? And how does it help address those issues?

Parsons: The Catalyst UK program is jointly funded by a number of large companies and three universities: The University of Bristol, the University of Leicester, and the University of Edinburgh. It is UK-focused because Arm Holdings is based in the UK, and there is a long history in the UK of exploring new processor technologies.

Through Catalyst, each of the three universities hosts a 4,000-core ARM processor-based system. We are running them as services. At my university, for example, we now have a number of my staff using this system. But we also have external academics using it, and we are gradually opening it up to other users.

Catalyst for change in processor

We want as many people as possible to understand how difficult it will be to port their code to ARM. Or, rather -- as we will explore in this podcast -- how easy it is.

You only learn by breaking stuff, right? And so, we are going to learn which bits of the software tool chain, for example, need some work. [Such porting is necessary] because ARM predominantly sat in the mobile phone world until recently. The supercomputing and AI world is a different space for the ARM processor to be operating in.

Gardner: Eng Lim, why is this program of interest to HPE? How will it help create new opportunity and performance benchmarks for such uses as AI?

Goh
Goh: Mark makes a number of very strong points. First and foremost, we are very keen as a company to broaden the reach of HPC among our customers. If you look at our customer base, a large portion of them come from the commercial HPC sites, the retailers, banks, and across the financial industry. Letting them reach new types of HPC is important and a variety of offerings makes it easier for them.

The second thing is the recent reemergence of more AI applications, which also broadens the user base. There is also a need for greater specialization in certain areas of processor capabilities. We believe in this case, the ARM processor -- given the fact that it enables different companies to build innovative variations of the processor – will provide a rich set of new options in the area of AI.

Gardner: What is it, Mark, about the ARM architecture and specifically the Marvell ThunderX2 ARM processor that is so attractive for these types of AI workloads?

Expanding memory for the future 

Parsons: It’s absolutely the case that all numerical computing -- AI, supercomputing, and desktop technical computing -- is controlled by memory bandwidth. This is about getting data to the processor so the processor core can act on it.

What we see in the ThunderX2 now, as well as in future iterations of this processor, is the strong memory bandwidth capabilities. What people don’t realize is a vast amount of the time, processor cores are just waiting for data. The faster you get the data to the processor, the more compute you are going to get out with that processor. That’s one particular area where the ARM architecture is very strong.

Goh: Indeed, memory bandwidth is the key. Not only in supercomputing applications, but especially in machine learning (ML) where the machine is in the early phases of learning, before it does a prediction or makes an inference.
How UK universities
Collaborate with HPE
To Advance ARM-Based Supercomputing
It has to go through the process of learning, and this learning is a highly data-intensive process. You have to consume massive amounts of historical data and examples in order to tune itself to a model that can make good predictions. So, memory bandwidth is utmost in the training phase of ML systems.

And related to this is the fact that the ARM processor’s core intellectual property is available to many companies to innovate around. More companies therefore recognize they can leverage that intellectual property and build high-memory bandwidth innovations around it. They can come up with a new processor. Such an ability to allow different companies to innovate is very valuable.

Gardner: Eng Lim, does this fit in with the larger HPE drive toward memory-intensive computing in general? Does the ARM processor fit into a larger HPE strategy?

https://en.wikipedia.org/wiki/Arm_Holdings
Goh: Absolutely. The ARM processor together with the other processors provide choice and options for HPE’s strategy of being edge-centric, cloud-enabled, and data-driven.

Across that strategy, the commonality is data movement. And as such, the ARM processor allowing different companies to come in to innovate will produce processors that meet the needs of all these various kinds of sectors. We see that as highly valuable and it supports our strategy.

Gardner: Mark, Arm Holdings controls the intellectual property, but there is a budding ecosystem both on the processor design as well as the software that can take advantage of it. Tell us about that ecosystem and why the Catalyst UK program is facilitating a more vibrant ecosystem.

The design-to-build ecosystem 

Parsons: The whole Arm story is very, very interesting. This company grew out of home computing about 30 to 40 years ago. The interesting thing is the way that they are an intellectual property company, at the end of the day. Arm Holdings itself doesn’t make processors. It designs processors and sells those designs to other people to make.
We've had this wonderful ecosystem of different companies making their own ARM processors or making them for other people. It's no surprise it's the most common processor in the world today.

So, we’ve had this wonderful ecosystem of different companies making their own ARM processors or making them for other people. With the wide variety of different ARM processors in mobile phones, for example, there is no surprise that it’s the most common processor in the world today.

Now, people think that x86 processors rule the roost, but actually they don’t. The most common processor you will find is an ARM processor. As a result, there is a whole load of development tools that come both from ARM and also within the developer community that support people who want to develop code for the processors.

In the context of Catalyst UK, in talking to Arm, it’s quite clear that many of their tools are designed to meet their predominant market today, the mobile phone market. As they move into the higher-end computing space, it’s clear we may find things in the programs where the compiler isn’t optimized. Certain libraries may be difficult to compile, and things like that. And this is what excites me about the Catalyst program. We are getting to play with leading-edge technology and show that it is easy to use all sorts of interesting stuff with it.
How UK universities
Collaborate with HPE
To Advance ARM-Based Supercomputing
Gardner: And while the ARM CPU is being purpose-focused for high-intensity workloads, we are seeing more applications being brought in, too. How does the porting process of moving apps from x86 to ARM work? How easy or difficult is it? How does the Catalyst UK program help?

Parsons: All three of the universities are porting various applications that they commonly use. At the EPCC, we run the national HPC service for the UK called ARCHER. As part of that we have run national [supercomputing] services since 1994, but as part of the ARCHER service, we decided for the first time to offer many of the common scientific applications as modules.

You can just ask for the module that you want to use. Because we saw users compiling their own copies of code, we had multiple copies, some of them identically compiled, others not compiled particularly well.

https://www.ed.ac.uk/
So, we have a model of offering about 40 codes on ARCHER as precompiled where we are trying to keep them up to date and we patch them, etc. We have 100 staff at EPCC that look after code. I have asked those staff to get an account on the Catalyst system, take that code across and spend an afternoon trying to compile. We already know for some that they just compile and run. Others may have some problems, and it’s those that we’re passing on to ARM and HPE, saying, “Look, this is what we found out.”

The important thing is that we found there are very few programs [with such problems]. Most code is simply recompiling very, very smoothly.

Gardner: How does HPE support that effort, both in terms of its corporate support but also with the IT systems themselves?

ARM’s reach 

Goh: We are very keen about the work that Mark and the Catalyst program are doing. As Mark mentioned, the ARM processor came more from the edge-centric side of our strategy. In mobile phones, for example.


Now we are very keen to see how far these ARM systems can go. Already we have shipped to the US Department of Energy at the Sandia National Lab a large ARM processor-based supercomputer called Astra. These efforts are ongoing in the area of HPC applications. We are very keen to see how this processor and the compilers for it work with various HPC applications in the UK and the US.

Gardner: And as we look to the larger addressable market, with the edge and AI being such high-growth markets, it strikes me that supercomputing -- something that has been around for decades -- is not fully mature. We are entering a whole new era of innovation.

Mark, do you see supercomputing as in its heyday, sunset years, or perhaps even in its infancy?

Parsons: I absolutely think that supercomputing is still in its infancy. There are so many bits in the world around us that we have never even considered trying to model, simulate, or understand on supercomputers. It’s strange because quite often people think that supercomputing has solved everything -- and it really hasn’t. I will give you a direct example of that.
Supercomputing is still in its infancy. There are so many bits in the world around us that we have never even considered trying to model, simulate, or understand on supercomputers. It's strange because people think that supercomputers have already solved everything.

A few years ago, a European project I was running won an award for simulating the highest accuracy of water flowing through a piece of porous rock. It took over a day on the whole of the national service [to run the simulation]. We won a prize for this, and we only simulated 1 cubic centimeter of rock.

People think supercomputers can solve massive problems -- and they can, but the universe and the world are complex. We’ve only scratched the surface of modeling and simulation.

This is an interesting moment in time for AI and supercomputing. For a lot of data analytics, we have at our fingertips for the very first time very, very large amounts of data. It’s very rich data from multiple sources, and supercomputers are getting much better at handling these large data sources.

https://www.hpe.com/us/en/solutions/hpc-high-performance-computing.html
The reason the whole AI story is really hot now, and lots of people are involved, is not actually about the AI itself. It’s about our ability to move data around and use our data to train AI algorithms. The link directly into supercomputing is because in our world we are good at moving large amounts of data around. The synergy now between supercomputing and AI is not to do with supercomputing or AI – it is to do with the data.

Gardner: Eng Lim, how do you see the evolution of supercomputing? Do you agree with Mark that we are only scratching the surface?

Top-down and bottom-up data crunching 

Goh: Yes, absolutely, and it’s an early scratch. It’s still very early. I will give you an example.

Solving games is important to develop a method or strategy for cyber defense. If you just take the most recent game that machines are beating the best human players, the game of Go, is much more complex than chess in terms of the number of potential combinations. The number of combinations is actually 10171, if you comprehensively went through all the different combinations of that game.
How UK universities
Collaborate with HPE
To Advance ARM-Based Supercomputing
You know how big that number is? Well, okay, if we took all computers in the world together, all the supercomputers, all of the computers in the data centers of the Internet companies and put them all together, run them for 100 years -- all you can do is 1030 , which is so very far from 10171. So, you can see just by this one game example alone that we are very early in that scratch.

A second group of examples relates to new ways that supercomputers are being used. From ML to AI, there is now a new class of applications changing how supercomputers are used. Traditionally, most supercomputers have been used for simulation. That’s what I call top-down modeling. You create your model out of physics equations or formulas and then you run that model on a supercomputer to try and make predictions.

https://en.wikipedia.org/wiki/Arm_Holdings
The new way of making predictions uses the ML approach. You do not begin with physics. You begin with a blank model and you keep feeding it data, the outcomes of history and past examples. You keep feeding data into the model, which is written in such a way that for each new piece of data that is fed, a new prediction is made. If the accuracy is not high, you keep tuning the model. Over time -- with thousands, hundreds of thousand, and even millions of examples -- the model gets tuned to make good predictions. I call this the bottom-up approach.

Now we have people applying both approaches. Supercomputers used traditionally in a top-down simulation are also employing the bottom-up ML approach. They can work in tandem to make better and faster predictions.

Supercomputers are therefore now being employed for a new class of applications in combination with the traditional or gold-standard simulations.

Gardner: Mark, are we also seeing a democratization of supercomputing? Can we extend these applications and uses? Is what’s happening now decreasing the cost, increasing the value, and therefore opening these systems up to more types of uses and more problem-solving?

Cloud clears the way for easy access 

Parsons: Cloud computing is having a big impact on everything that we do, to be quite honest. We have all of our photos in the cloud, our music in the cloud, et cetera. That’s why EPCC last year got rid of its file server. All our data running the actual organization is in the cloud.

The cloud model is great inasmuch as it allows people who don’t want to operate and run a large system 100 percent of the time the ability to access these technologies in ways they have never been able to do before.
The cloud model is great inasmuch as it allows people who don't want to operate and run a large system 100 percent of the time the ability to access these technologies in ways they have never been able to do before.

The other side of that is that there are fantastic software frameworks now that didn’t exist even five years ago for doing AI. There is so much open source for doing simulations.

It doesn’t mean that an organization like EPCC, which is a supercomputing center, will stop hosting large systems. We are still great aggregators of demand. We will still have the largest computers. But it does mean that, for the first time through the various cloud providers, any company, any small research group and university, has access to the right level of resources that they need in a cost-effective way.

Gardner: Eng Lim, do you have anything more to offer on the value and economics of HPC? Does paying based on use rather than a capital expenditure change the game?

More choices, more innovation 

Goh: Oh, great question. There are some applications and institutions with processes that work very well with a cloud, and there are some applications that don’t and processes that don’t. That’s part of the reason why you embrace both. And, in fact, we at HPE embrace the cloud and we also we build on-premises solutions for our customers, like the one at the Catalyst UK program.

We also have something that is a mix of the two. We call that HPE GreenLake, which is the ability for us to acquire the system the customer needs, but the customer pays per use. This is software-defined experience on consumption-based economics.

These are some of the options we put together to allow choice for our customers, because there is a variation of needs and processes. Some are more CAPEX-oriented in a way they acquire resources and others are more OPEX-oriented.

https://www.hpe.com/us/en/home.html
Gardner: Do you have examples of where some of the fruits of Catalyst, and some of the benefits of the ecosystem approach, have led to applications, use cases, and demonstrated innovation?

Parsons: What we are trying to do is show how easy ARM is to use. We have taken some really powerful, important code that runs every day on our big national services and have simply moved them across to ARM. Users don’t really understand or don’t need to understand they are running on a different system. It’s that boring.

We have picked up one or two problems with code that probably exist in the x86 version, but because you are running a new processor, it exposes it more, and we are fixing that. But in general -- and this is absolutely the wrong message for an interview -- we are proceeding in a very boring way. The reason I say that is, it’s really important that this is boring, because if we don’t show this is easy, people won’t put ARM on their next procurement list. They will think that it’s too difficult, that it’s going to be too much trouble to move codes across.

One of the aims of Catalyst, and I am joking, is definitely to be boring. And I think at this point in time we are succeeding.

More interestingly, though, another aim of Catalyst is about storage. The ARM systems around the world today still tend to do storage on x86. The storage will be running on Lustre or BeeGFS server, all sitting on x86 boxes.

We have made a decision to do everything on ARM, if we can. At the moment, we are looking at different storage software on ARM services. We are looking at Ceph, at Lustre, at BeeGFS, because unless you have the ecosystem running in ARM as well, people won’t think it’s as pervasive of a solution as x86, or Power, or whatever.

The benefit of being boring 

Goh: Yes, in this case boring is good. Seamless movement of code across different platforms is the key. It’s very important for an ecosystem to be successful. It needs to be easy to develop code for and it, and it needs to be easy to port. And those are just as important with our commercial HPC systems for the broader HPC customer base.

In addition to customers writing their own code and compiling it well and easily to ARM, we also want to make it easy for the independent software vendors (ISVs) to join and strengthen this ecosystem.

https://www.ed.ac.uk/
Parsons: That is one of the key things we intend to do over the next six months. We have good relationships, as does HPE, with many of the big and small ISVs. We want to get them on a new kind of system, let them compile their code, and get some help to do it. It’s really important that we end up with ISV code on ARM, all running successfully.

Gardner: If we are in a necessary, boring period, what will happen when we get to a more exciting stage? Where do you see this potentially going? What are some of the use cases using supercomputers to impact business, commerce, public services, and public health?

Goh: It’s not necessarily boring, but it is brilliantly done. There will be richer choices coming to supercomputing. That’s the key. Supercomputing and HPC need to reach a broader customer base. That’s the goal of our HPC team within HPE.

Over the years, we have increased our reach to the commercial side, such as the financial industry and retailers. Now there is a new opportunity coming with the bottom-up approach of using HPC. Instead of building models out of physics, we train the models with example data. This is a new way of using HPC. We will reach out to even more users.
How UK universities
Collaborate with HPE
To Advance ARM-Based Supercomputing
So, the success of our supercomputing industry is getting more users, with high diversity, to come on board.

Gardner: Mark, what are some of the exciting outcomes you anticipate?

Parsons: As we get more experience with ARM it will become a serious player. If you look around the world today, in Japan, for example, they have a big new ARM-based supercomputer that’s going to be similar to the Thunder X2 when it’s launched.


I predict in the next three or four years we are going to see some very significant supercomputers up at the X2 level, built from ARM processors. Based on what I hear, the next generations of these processors will produce a really exciting time.

Listen to the podcast. Find it on iTunes. Read a full transcript or download a copy. Sponsor: Hewlett Packard Enterprise.

You may also be interested in: