Wednesday, August 3, 2011

Case study: MSP InTechnology improves network services via automation and consolidation of management systems

Listen to the podcast. Find it on iTunes/iPod and Podcast.com. Read a full transcript or download a copy. Sponsor: HP.

The latest BriefingsDirect podcast discussion focuses on a UK-based managed service provider’s journey to provide better information and services for its network, voice, VoIP, data, and storage customers. The network management and productivity benefits have come from an alignment of many service management products into an automated lifecycle approach to overall network operations.

We explore here how InTechnology has implemented a coordinated, end-to-end solution using HP solutions that actually determine the health of its networks by aligning their tools to ITIL methods. And, by using their system-of-record approach with a configuration management database, InTechnology is better serving its customers with lean resources by leveraging systems over manual processes.

Hear from an operations manager, Ed Jackson, Operational System Support Manager at InTechnology, to explore their choices and outcomes when it comes to better operations and better service for their hundreds of enterprise customers. The discussion is moderated by Dana Gardner, Principal Analyst at Interarbor Solutions. [Disclosure: HP is a sponsor of BriefingsDirect podcasts.]

Here are some excerpts:
Jackson: We've basically been growing exponentially year over year. In the past four years, we've grown our network about 75 percent. In terms of our product set, we've basically tripled that in size, which obviously leads to major complexity on both our network and how we manage the product lifecycle.

Previously, we didn’t have anything that could scale as well as the systems that we have in place now. We couldn’t hope to manage 8,000 or 9,000 network devices, plus being able to deliver a product lifecycle, from provisioning to decommission, which is what we have now.

It's pretty massive in terms of the technologies involved. A lot of them are cutting-edge. We have many partners. Our suite of cloud services is very diverse and comprises what we believe is the UK’s most complete and "joined-up"set of pay-monthly voice and data services.

Their own pace

In practice what we aim to do is help our customers engage with the cloud at a pace that works for them. First, we provide connectivity to our nationwide network ring – our cloud. Once their estate is connected they can then cherry pick services from our broad pay-as-you-go (PAYG) menu.

For example, they might be considering replacing their traditional "tin" PBXs with hosted IP telephony. We can do that and demonstrate massive savings. Next we might overlay our hosted unified communications (UC) suite providing benefits such as "screen sharing," "video calling," and "click-to-dial." Again, we can demonstrate huge savings on planes, trains and automobiles.

Next we might overlay our exciting new hosted call recording package -- Unity Call Recording (UC) -- which is perfect if they are in a regulated industry and have a legal requirement to record calls. It’s got some really neat features including the ability to tag and bookmark calls to help easy searching and playback.

While we're doing this, we might also explore the data path. For example our new FlexiStor service provides what we think is the UK’s most straightforward PAYG service designed to manage data by its business "value" and not just as one big homogenous lump of data. It treats data as critical, important or legacy and applies an appropriate storage process to each ... saving up to 40 percent against traditional data management methods.

Imagine trying to manage this disparate set of systems. It would be pretty impossible. But due to the HP product set that we have, we've been able to utilize all the integrations and have a fully managed, end-to-end lifecycle of the service, the devices, and the product sets that we have as a company.

[Our adoption of the HP suites] was spurred by really bad data that we had in the systems. We couldn't effectively go forward. We couldn't scale anymore. So, we got the guys at HP to come in and design us a solution based on products that we already had, but with full integration, and add in additional products such as HP Asset Manager and device Discovery and Dependency Mapping Inventory (DDMI).

With the systems that we already had in place, we utilized mainly HP Service Desk. So we decided to take the bold leap to go to Service Manager, which then gave us the ability to integrate it fully into the Operations Manager product and our Network Node Manager product.

Since we had the initial integrations, we've added extra integrations like Universal Configuration Management Database (UCMDB), which gives us a massive overview on how the network is progressing and how it's developing. Coupled with this, we've got Release Control, and we've just upgraded to the latest version of Service Manager 9.2.

For any auditor that comes in, we have a documented set of reports that we can give them. That will hopefully help us get this compliance and maintain it.



... We recently upgraded Connect-It from 4.1 to 9.3, and with that, we upgraded Asset Manager System to 9.3. Connect-It is the glue that holds everything together. It's a fantastic application that you can throw pretty much any data at, from a CSV file, to another database, to web services, to emails, and it will formulate it for you. You can do some complex integrations in that. It will give you the data that you want on the other side and it cleanses and parses, so that you can pass it on to other systems.

From our DDMI system, right through to our Service Manager, then into our Network Node Manager, we now have a full set of solutions that are held together by Connect-It.

We can discover the device on the network. We can then propagate it into Service Manager. We can add lots of financial details to it from other financial systems outside of the HP product set, but which are easy to integrate. We can therefore provision the circuit and provision the device and add to monitoring automatically, without any human intervention, just by the fact that the device gets shipped to the site.

It gets loaded up with the configuration, and then it's good to go. It's automatically managed right through to the decommissioning stage, or the upgrade stage, where it's replaced by another device. HP systems give us that capability.

So this all has given us a huge benefit in terms of process control, how ITIL is related. More importantly, one of the main things that we are going for at the moment is payment card industry (PCI) and ISO 27001 compliance.

For any auditor that comes in, we have a documented set of reports that we can give them. That will hopefully help us get this compliance and maintain it. One of the things as an MSP is that we can be compliant for the customer. The customer can have the infrastructure outsourced to us with the compliance policy in that. We can take the headache of compliance away from our customers.

More and more these days, we have a lot of solicitors and law firms on our books, and we're getting "are you compliant" as a request before they place business with us. We're finding all across the industry that compliance is a must before any contract is won. So to keep one step ahead of the game, this is something that we're going to have to achieve and maintain, and the HP product set that we have is key in that.

Due to the HP product set that we have, we've been able to utilize all the integrations and have a fully managed, end-to-end lifecycle of the service.



In terms of our service and support, we've basically grown the network massively, but we haven’t increased any headcount for managing the network. Our 24/7 guys are the same as they were four or five years ago in terms of headcount.

We get on average around 5,000 incidents a month automatically generated from our systems and network devices. Of these incidents, only about 560 are linked to customer facing Interactions using our Service Desk Module in the Service Manager application.

Approximately 80 percent of our total incidents are generated automatically. They are either proactively raised, based on things like CPU and memory of network devices or virtual devices or even physical servers in our data centers, or reactively raised based on for example device or interface downs.

Massive burden

When you've got like 80 percent of all incidents raised automatically, it takes a massive burden off the 24/7 teams and the customer support guys, who are not spending the majority of their time creating incidents but actually working to resolve them.

When we originally decided to take the step to upgrade from Service Desk to Service Manager and to get the network discovery product set in, we used HP’s Professional Services to effectively design the solution and help us implement it.

Within six months, we had Service Desk upgraded to Service Manager. We had an asset manager system that was fully integrated with our financials, our stock control. And we also had a Network Discovery toolset that was inventorying our estate. So we had a fully end-to-end solution.

Automatic incidents

I
nto that, we have helped to develop the Network Operations Management Solution into being able to generate automatic incidents. HP PS services provided a pivotal role in providing us with the kind of solutions that we have now.

Since then, we took that further, because we have very good in-house knowledgeable guys that really understand the HP systems and services. So we've taken it bit of a step further, and most of the stuff that we do now in terms of upgrades and things are done in-house.

One of the key benefits is it gives us a unique calling card for our potential customers. I don’t know of many other MSPs that have such an automated set of technology tools to help them manage the service that they provide to their customers.

Five years ago, this wasn't possible. We had disparate systems and duplicate data held in multiple areas So it wasn’t possible to have the integration and the level of support that we give our customers now for the new systems and services that we provide.

Mean time to restore has come down significantly, by way over 15 percent. As I said, there has been zero increase in headcount over our systems and services. We started off with a few thousand network devices and only three or four different products, in data, storage, networks and voice. Now we've got 16 different kinds of product sets, with about 8,000, 9,000 network devices.

In terms of cost saving, and increased productivity, this has been huge. Our 24/7 teams and customer support teams are more proactive in using knowledge bases and Level 1 triage. Resolution of incidents has gone up by 25 percent by customer support teams and level 1 engineers; this enables the level 3 engineers to concentrate on more complex issues.

In terms of SLAs, we manage the availability of network devices. It gives us a lot more flexibility in how we give these availability metrics to the customers.



If you take a Priority 3, Priority 4 incident, 70 percent of those are now fixed by Level 1 engineers, which was unheard of five or six years ago. Also, we now have a very good knowledge base in the Service Manager tool that we can use for our Level 1 engineers.

In terms of SLAs, we manage the availability of network devices. It gives us a lot more flexibility in how we give these availability metrics to the customers. Because we're business driven by other third party suppliers, we can maintain and get service credits from them. We've also got a fully documented incident lifecycle. We can tell when the downtime has been on these services, and give our suppliers a bit of an ear bashing about it, because we have this information to hand them. We didn’t have that five or six years ago.

With event correlation, we reduced our operations browsers down to just meaningful incidents, we filtered our events from over 100,000 a month to less than 20,000 many of these are duplicated and are correlated together. Most events are associated with knowledge base articles in Service Manager and contain instructions to escalate or how to resolve the event, increasingly by a level 1 engineer.

Contacting customers within agreed SLAs and how we can drive our suppliers to provide better service is fantastic because of the information that is available in the systems now. It gives us a lot more heads up on what’s happening around the network.

We're building a lot of information, taken from our financial systems and placing it into our UCMDB and CMDB databases to give us the breakdown of cost per device, cost per month, because now this information is available.

We have a couple of data centers. One of our biggest costs is power usage. Now, we can break down by use of collecting the power information, using NNMi -- how much our power is costing per rack by terms of how many amps have been used over a set period of time, say a week or a month. where previously we had no way of determining how our power usage was being spent or how much was actually costing us per rack or per unit.

From this performance information, we can also give our customers extra value reports and statistics that we can charge as a value added managed solution for them.



It's given us a massive information boost, and we can really utilize the information, especially in UCMDB, and because it’s so flexible, we can tailor it to do pretty much whatever we want. From this performance information, we can also give our customers extra value reports and statistics that we can charge as a value added managed solution for them.

[In terms of getting started], one of the main things is to have a clear goal in mind before you start. Plan everything, get it all written down, and have the processes looked at before you start implementing this, because it’s fairly hard to re-engineer if you decided that one of the actual solutions or one of the processes that you have implemented isn’t going to work. Because of the integration of all the systems, you might tend to find that reverse engineering them is a difficult task.

As a company, we decided to go for a clean start and basically said we'd filter all the data, take the data that we actually really required, and start off from scratch. We found that doing it that way, we didn’t get any bad data in there. All the data that we have now is pretty much been cleansed and enriched by the information that we can get from our automated systems, but also by utilizing the extra data that people have put in.
Listen to the podcast. Find it on iTunes/iPod and Podcast.com. Read a full transcript or download a copy. Sponsor: HP.

You may also be interested in:

No comments:

Post a Comment