Thursday, October 9, 2014

ITSM adoption forces a streamlined IT operations culture at Desjardins, paves the way to cloud

Our next innovation case study interview highlights how Desjardins Group in Montréal is improving their IT operations through an advanced IT services management (ITSM) approach.

Listen to the podcast. Find it on iTunes. Read a full transcript or download a copy.

To learn more, BriefingsDirect sat down with Trung Quach, ITSM Manager at Desjardins in Québec, at the recent HP Discover conference in Las Vegas. The discussion is moderated by me, Dana Gardner, Principal Analyst at Interarbor Solutions.

Here are some excerpts:

Gardner: First tell us a little bit about your organization, you have a large network of credit unions.

Quach: It’s more like cooperative banking. We are around 50,000 people across Québec, and we've started moving into both Canada and the US.
Gain better control over help desk quality and impact.  
Learn how to make your help desk more relevant
with a free white paper.
Gardner: Tell us a little bit about your IT organization, the size, how many people, how many datacenters? What sort of IT organization do you have?

Quach: We're around 2,500 and counting. We're mainly based in Montréal and Lévis, which is near Québec City. Most of them are in Montréal, but some technical people are in Lévis. 

Gardner: Tell us about your role. What are you doing there as ITSM manager?

The ITIL process

Quach: I joined Desjardins last year in the ITSM leader position. This is more about the process, the ITIL process and everything that's invloved with the tool, as well as to support those overall processes.

Gardner: Tell us why ITSM has become important to you. What were some of the challenges, some of the requirements? What was the environment you were in that required you to adopt better ITSM principles?

Quach: A couple of years ago, when they merged 10-plus silos of IT into one big group, Desjardins needed to centralize the process, put best practice in place, to be more efficient and competitive -- and to give a higher value to the business.

Gardner: What, in particular, were issues that cropped up as a result of that decentralization? Was this poor performance, too much cost, too many manual processes, all of the above?

Quach: We had a lot of manual processes, and a lot of tools. To be able to measure the performance of a team, you need to use the same process and the same tools, and then measure yourself on it. You need to optimize the way you do it, so that you can provide better IT services.

Gardner: What have been some of the results of your movement toward ITSM? What sort of benefits have you realized as a result?

Quach: We had many of them. Some were financial, but the most important thing, I think, is the services quality and the availability of those services. So one indicator is a reduction in major incidents of 30 percent for the last two years.

Gardner: What is it about your use of ITSM that has led to that significant reduction in incidents? How does that translate?

Quach: We put our new problem management approach to work as well with the problem processes. When we open tickets, we can take care of the incidents in a coordinated way at an enterprise level. So the impact is everywhere. We can now advise the line of businesses, follow up with the incident, and close the incident rapidly. We follow up with any problems, and then we fix the real issues so that they don’t come back.

Gardner: Have you used this to translate back to any applications development, or custom development in your organization? Or is this more on the operations side strictly?

Better support

Quach: We started all of this on the operations side. But then we started last year on the development side, too. They're involved in our process slowly, and that’s going to soon get better, so we can support the full IT lifecycle better.

Gardner: Tell us about HP Discover. What's of interest to you? Have you been looking at what HP has been doing with their tools? What's of most importance to you in terms of what they do with their technology?

Quach: I can tell you how important it is for us. Last year we didn't go to HP Discover. This year, around eight in my team and the architecture team are here. That shows you how important it is.

Now we spread out. A lot of my team members went to explore tools and everything else that HP has to offer -- and HP has a lot of offer. We went to learn about the cloud, as well as big data. It all works together. That’s why it was important for us to come here. ITSM is the main reason we're here, but I want to make sure that everything works together, because the IT processes touch everything.
Gain better control over help desk quality and impact.  
Learn how to make your help desk more relevant
with a free white paper.
Gardner: I've talked to a number of organizations, Trung, and they've mentioned that before they feel comfortable moving into more cloud activities, and before they feel comfortable adopting big data, analytics platforms, they want to make sure they have everything else in order. So ITSM is an important step for them to then go to larger, more complex undertakings. Is that your philosophy as well?

Quach: Yes. There are two ways to do this. You use that technology to force yourself to be disciplined, or you discipline yourself. ITSM is one way to do it. You force yourself to work in a certain manner, a streamlined manner, and then you can go to the cloud. It's easier that way.

Gardner: Then, of course, you also have standardization in culture, in organization, not just technology, but the people and the process, and that can be very powerful.

Quach: If asked me about cloud -- and I have done this with another company -- in a 30-minute interview about cloud, I would use 29 minutes to not talk about technology but about people and processes.

Gardner: How about the future of IT? Any thoughts about or the big picture of where technology is going? Even as we face larger data volumes, perhaps more complexity, and mobile applications, what are your thoughts about how we solve some of those issues in the big picture?

Time to market

Quach: IT more and more is going to have a challenge for meeting the speed demanded for improved time to market. But to do that, you need processes, technology, and of course, people. So the client, the business, is going to ask us to be faster. That’s why we'll need to go in that cloud. But to go in the cloud, we need to master our IT services, and then go in the cloud. If not, it would be like not going to the cloud and not having that agility. We would not be competitive.
Gain better control over help desk quality and impact.  
Learn how to make your help desk more relevant
with a free white paper.
Gardner: Looking back, now that you have gone through an ITSM advancement, for those who are just beginning, what are some thoughts that you could share with them?

Quach: In an ITSM project, it's very hard to manage change. I'm talking about the people change, not the change-management technology process. Most of the time, you put that in place and say that everybody has to work with it. If I would redo it, I would bring more people to understand the latest ITSM science and processes, and explain why in five or 10 years, it's going to really help us.
You always have to be close to your clients. Even if they are IT, they are your client or partner.

After that, we'll put in the project, but we'll follow them and train them every year. ITSM is a never-ending story. You always have to be close to your clients. Even if they are IT, they are your client or partners. You need to coach them, to make sure they understand why they're doing this. Sometimes it’s a bit longer to get it right at the beginning, but it’s all worth it at the end.

Listen to the podcast. Find it on iTunes. Read a full transcript or download a copy. Sponsor: HP.

You may also be interested in:

Tuesday, October 7, 2014

MIT Media Lab computing director details the virtues of cloud for agility and disaster recovery

The next BriefingsDirect innovator case study interview focuses on the MIT Media Lab in Cambridge, Mass., and how they're exploring the use of cloud and hybrid cloud to enjoy such use benefits as IT speed, agility and robust, three-tier disaster recovery (DR).

Listen to the podcast. Find it on iTunes. Read a full transcript or download a copy.

To learn more about how the MIT Media Lab is exploiting cloud computing, we’re joined by Michail Bletsas, research scientist and Director of Computing at the MIT Media Lab. The discussion, at the recent VMworld 2014 Conference in San Francisco, is moderated by me, Dana Gardner, Principal Analyst at Interarbor Solutions.

Here are some excerpts:

Gardner: Tell us about the MIT Media Lab and how it manages its own compute requirements.

Bletsas: The organization is one of the many independent research labs within MIT. MIT is organized in departments, which do the academic teaching, and research labs, which carry out the research.
The Media Lab is a unique place within MIT. We deviate from the normal academic research lab in the sense that a lot of our funding comes from member companies, and it comes in a non-direct fashion. Companies become members of the lab, and then we get the freedom to do whatever we think is best.

We try to explore the future. We try to look at what our digital life will look like 10 years out, or more. We're not an applied research lab in the sense that we're not looking at what's going to happen two or three years from now. We're not looking at short-term future products. We're looking at major changes 15 years out.

I run the group that takes care of the computing infrastructure for the lab and, unlike a normal IT department, we're kind of heavy on computing. We use computers as our medium. The Media Lab is all about human expression, which is the reason for the name and computers are one of the main means of expression right now. We're much heavier than other departments in how many devices you're going to see. We're on a pretty complex network and we run a very dynamic environment.

Major piece

A lot has changed in our environment in recent years. I've been there for almost 20 years. We started with very exotic stuff. These days, you still build exotic stuff, but you're using commodity components. VMware, for us, is a major piece of this strategy because it allows us a more efficient utilization of our resources and allows us to control a little bit the server proliferation that we experienced and that everybody has experienced.

We normally have about 350 people in the lab, distributed among staff, faculty members, graduate students, and undergraduate students, as well as affiliates from the various member companies. There is usually a one-to-five correspondence between virtual machines (VMs), physical computers, and devices, but there are at least 5 to 10 IPs per person on our network. You can imagine that having a platform that allows us to easily deploy resources in a very dynamic and quick fashion is very important to us.

We run a relatively small operation for the size of the scope of our domain. What's very important to us is to have tools that allow us to perform advanced functions with a relatively short learning curve. We don’t like long learning curves, because we just don’t have the resources and we just do too many things.

You are going to see functionality in our group that is usually only present in groups that are 10 times our size. Each person has to do too many things, and we like to focus on technologies that allow us to perform very advanced functions with little learning. I think we've been pretty successful with that.
We really need to interact with our infrastructure on a much shorter cycle than the average operation.

Gardner: How have you created a data center that’s responsive, but also protects your property?

Bletsas: Unlike most people, we tend to have our resources concentrated close to us. We really need to interact with our infrastructure on a much shorter cycle than the average operation. We've been fortunate enough that we have multiple, small data centers concentrated close to where our researchers are. Having something on the other side of the city, the state, or the country doesn’t really work in an environment that’s as dynamic as we are.

We also have to support a much larger community that consists of our alumni or collaborators. If you look at our user database right now, it’s something in the order of 3,500, as opposed to 350. It’s a very dynamic in that it changes month to month. The important attributes of an environment like this is that we can’t have too many restrictions. We don’t have an approved list of equipment like you see in a normal corporate IT environment.

Our modus operandi is that if you bring it to us, we’ll make it work. If you need to use a specific piece of equipment in your research, we’ll try to figure out how to integrate it into your workflow and into what we have in there. We don’t tell people what to use. We just help them use whatever they bring to us.

In that respect, we need a flexible virtualization platform that doesn’t impose too many restrictions on what operating systems you use or what the configuration of the VMs are. That’s why we find that solutions, like general public cloud, for us are only applicable to a small part of our research. Pretty much every VM that we run is different than the one next to it. 

Flexibility is very important to us. Having a robust platform is very, very important, because you have too many parameters changing and very little control of what's going on. Most importantly, we need a very solid, consistent management interface to that. For us, that’s one of the main benefits of the vSphere VMware environment that we’re on.

Public or hybrid

Gardner: What about taking advantage of cloud, public cloud, and hybrid cloud to some degree, perhaps for disaster recovery (DR) or for backup failover. What's the rationale, even in your unique situation, for using a public or hybrid cloud?

Bletsas: We use hybrid cloud right now that’s three-tiered. MIT has a very large campus. It has extensive digital infrastructure running our operations across the board. We also have facilities that are either all the way across campus or across the river in a large co-location facility in downtown Boston and we take advantage of that for first-level DR.

A solution like the vCloud Air allows us to look at a real disaster scenario, where something really catastrophic happens at the campus, and we use it to keep certain critical databases, including all the access tools around them, in a farther-away location.

It’s a second level for us. We have our own VMware infrastructure and then we can migrate loads to our central organization. They're a much larger organization that takes care of all the administrative computing and general infrastructure at MIT at their own data centers across campus. We can also go a few states away to vCloud Air [and migrate our workloads there in an emergency].
We know that remote events are remote, until they happen, and sometimes they do.

So it’s a very seamless transition using the same tools. The important attribute here is that, if you have an operation that small, 10 people having to deal with such a complex set of resources, you can't do that unless you have a consistent user interface that allows you to migrate those workloads using tools that you already know and you're familiar with.

We couldn’t do it with another solution, because the learning curve would be too hard. We know that remote events are remote, until they happen, and sometimes they do. This gives us, with minimum effort, the ability to deal with that eventuality without having to invest too much in learning a whole set of tools, a whole set of new APIs to be able to migrate.

We use public cloud services also. We use spot instances if we need a high compute load and for very specialized projects. But usually we don’t put persistent loads or critical loads on resources over which we don’t have much control. We like to exert as much control as possible.

Gardner: It sounds like you're essentially taking metadata and configuration data, the things that will be important to spin back up an operation should there be some unfortunate occurrence, and putting that into that public cloud, the vCloud Air public cloud. Perhaps it's DR-as-a-service, but only a slice of DR, not the entire data. Is that correct?

Small set of databases

Bletsas: Yes. Not the entire organization. We run our operations out of a small set of databases that tend to drive a lot of our websites. A lot of our internal systems drive our CRM operation. They drive our events management. And there is a lot of knowledge embedded in those databases.

It's lucky for us, because we're not such a big operation. We're relatively small, so you can include everything, including all the methods and the programs that you need to access and manipulate that data within a small set of VMs. You don’t normally use them out of those VMs, but you can keep them packaged in a way that in a DR scenario, you can easily get access to them.

Fortunately, we've been doing that for a very long time because we started having them as complete containers. As the systems scaled out, we tended to migrate certain functions, but we kept the basic functionality together just in case we have to recover from something.
We are fortunate enough to have a very good, intimate knowledge of our environment. We know where each piece lies. That’s the benefit of running a small organization

In the older days, we didn’t have that multi-tiered cloud in place. All we had was backups in remote data centers. If something happened, you had to go in there and find out some unused hardware that was similar to what you had, restore your backup, etc.

Now, because most of MIT's administrative systems run under VMware virtualization, finding that capacity is a very simple proposition in a data center across campus. With vCloud Air, we can find that capacity in a data center across the state or somewhere else.

Gardner: For organizations that are intrigued by this tiered approach to DR, did you decide which part of those tiers would go in which place? Did you do that manually? Is there a part of the management infrastructure in the VMware suite that allowed you to do that? How did you slice and dice the tiers for this proposition of vCloud Air holding a certain part of the data?

Bletsas: We are fortunate enough to have a very good, intimate knowledge of our environment. We know where each piece lies. That’s the benefit of running a small organization. We occasionally use vSphere’s monitoring infrastructure. Sometimes it reveals to us certain usage patterns that we were not aware of. That’s one of the main benefits that we found there.

We realized that certain databases were used more than we thought. Just looking at those access patterns told us, “Look, maybe you should replicate this." It doesn’t cost much to replicate this across campus and then maybe we should look into pushing it even further out.

It is a combination of having a visibility and nice dashboards that reveal patterns of activity that you might not be aware of even in an environment that's not as large as ours.

Gardner: At VMworld 2014, there was quite a bit of news, particularly in the vCloud Air arena. What intrigues you?

Standard building blocks

Bletsas: We like the move toward standardization of building blocks. That’s a good thing overall, because it allows you to scale out relatively quickly with a minor investment in learning a new system. That’s the most important trend out there for us. As I've said, we're a small operation. We need to standardize as much as possible, while at the same time, expanding the spectrum of services. So how do you do that? It’s not a very clear proposition.

The other thing that is of great interest to us is network virtualization. MIT is in a very peculiar situation compared to the rest of the world, in the sense that we have no shortage of IP addresses. Unlike most corporations where they expose a very small sliver of their systems to the outside world and everything happens on the back-end, our systems are mostly exposed out there to the public internet.

We don’t run very extensive firewalls. We're a knowledge dissemination and distribution organization and we don’t have many things to hide. We operate in a different way than most corporations. That shows also with networking. Our network looks like nothing like what you see in the corporate world. The ability to move whole sets of IPs around our domain, which is rather large and we have full control over, is a very important thing for us.

It allows for much faster DR. We can do DR using the same IPs across the town right now because our domain of control is large enough. That is very powerful because you can do very quick and simple DR without having to reprogram IP, DNS Servers, load balancers, and things like that. That is important.
That is very powerful because you can do very quick and simple DR without having to reprogram IP, DNS Servers, load balancers, and things like that.

The other trend that is also important is storage virtualization and storage tiering and you see that with all the vendors down in the exhibit space. Again, it allows you to match the application profile much easier to what resources you have. For a rather small group like ours, which can't afford to have all of its disk storage and very high-end systems, having a little bit of expensive flash storage, and then a lot of cheap storage, is the way for us to go.

The layers that have been recently added to VMware, both on the network side and the storage side help us achieve that in a very cost-efficient way.

For us, experimentation is the most important thing. Spinning out a large number of VMs to do a specific experiment is very valuable and being able to commandeer resources across campus and across data centers is a necessary requirement for something like an environment like this. Flexibility is what we get out of that and agility and speed of operations.
In the older days, you had to go and procure hardware and switch hardware around. Now, we rarely go into our data centers. We used to live in our data centers. We go there from time to time but not as often as we used to do, and that’s very liberating. It’s also very liberating for people like me because it allows me to do my work anywhere.

Listen to the podcast. Find it on iTunes. Read a full transcript or download a copy. Sponsor: VMware.

You may also be interested in: