Dana Gardner's BriefingsDirect: HP Data Protector, a case study on scale and completeness for total enterprise data backup and recovery

Tuesday, June 15, 2010

HP Data Protector, a case study on scale and completeness for total enterprise data backup and recovery

Listen to the podcast. Find it on iTunes/iPod and Podcast.com. Read a full transcript. Sponsor: HP.

Welcome to a special BriefingsDirect podcast series coming to you from the HP Software Universe 2010 Conference in Washington, DC. We're here the week of June 14, 2010 to explore some major enterprise software and solutions trends and innovations making news across HP's ecosystem of customers, partners, and developers.

Our topic for this live conversation focuses on the challenges and progress in conducting massive and comprehensive backups of enterprise live data, applications, and systems. We'll take a look at how HP Data Protector is managing and safeguarding petabytes of storage per week across HP's next-generation data centers.

The case-study on HP's ongoing experiences sheds light on how enterprises can consolidate their storage and backup efforts to improve response and recovery times, while also reducing total costs.

To learn more about high-performance enterprise scale storage and reliable backup, please welcome Lowell Dale, a technical architect in HP's IT organization. The interview is moderated by Dana Gardner, Principal Analyst at Interarbor Solutions.

Here are some excerpts:

Dale: One of the things that everyone is dealing with these days is the growth of data. Although we have a lot of technologies out there that are evolving -- vi rtua l ization and the globalization-effect -- what we're dealing with on the backup and recovery side is an aggregate amount of data that's just growing year after year.

Some of the things that we're running into are the effects of consolidation. For example, we end up trying to backup databases that are getting larger and larger. Some of the applications and servers that consolidate will end up being more of a challenge for some of the services such as backup and recovery. It's pretty common across the industry.

In our environment at HP, we're running about 93,000-95,000 backups per week with an aggregate data volume of about 4 petabytes of backup data and 53,000 run-time hours. That's about 17,000 servers worth of backup across 14 petabytes of storage.

It's pretty much every application that HP's business is run upon. It doesn’t matter if it's enterprise warehousing or data warehousing or if it's internal things like payroll or web-facing front-ends like hp.com. It's the whole slew of applications that we have to manage.

The storage technologies are managed across two different teams. We have a storage-focused team that manages the storage technologies. They're currently using HP Surestore XP Disk Array and EVA as well. We have our Fibre Channel networks in front of those. In the team that I work on, we're responsible for the backup and recovery of the data on that storage infrastructure.

We're using the Virtual Library Systems that HP manufactures as well as the Enterprise System Libraries (ESL). Those are two predominant storage technologies for getting data to the data protection pool.

One of the first things we had to do was simplify, so that we could scale to the size and scope that we have to manage. You have to find and simplify configuration and architecture as much as possible, so that you can continue to grow out scale.

We had to take a step-wise approach on how we adopted virtual tape library and what we used it for. Virtual tape libraries were one of the things that we had to figure out. What was the use-case scenario for virtual tape? It's not easy to switch from old technology to something new and go 100 percent at it.

We first started with a minimal amount of use-cases and little by little, we started learning what that was really good for. We’ve evolved the use case even more, so that in our next generation design that will move forward. That’s just one example.
We're still using physical tape for certain scenarios where we need the data mobility to move applications or enable the migration of applications and/or data between disparate geographies.

HP Data Protector 6.11 is the current release that we are running and deploying in our next generation. Some of the features with that release that are very helpful to us have to do with checkpoint recoveries.

For example, if the backup or resource should fail, we have the ability with automation to go out and have it pick up where it left off. This has helped us in multi-fold ways. If you have a bunch of data that you need to get backed up, you don’t want to start over, because it’s going to impact the next minute or the next hour of demand.

Not only that, but it’s also helped us be able to keep our backup success rates up and our tickets down. Instead of bringing a ticket to light for somebody to go look at it, it will attempt a few times for a checkpoint recovery. After so many attempts, then we’ll bring light to the issue so that someone would have to look at.

Data Protector also has a very powerful feature called object copy. That allowed us to maintain our retention of data across two different products or technologies. So, object copy was another one that was very powerful.

There are also a couple of things around the ability to do the integration backups. In the past, we were using some technology that was very expensive in terms of using of disk space on our XPs, and using split-mirror backups. Now, we're using the online integrations for Oracle or SQL and we're also getting ready to add SharePoint and Microsoft Exchange.

Now, we're able to do online backups of these databases. Some of them are upwards of 23 terabytes. We're able to do that without any additional disk space and we're able to back that up without taking down the environment or having any downtime. That’s another thing that’s been very helpful with Data Protector.

Scheduling overhead

With VMs increasing and the use case for virtualization increasing, one of the challenges is trying to work with scheduling overhead tasks. It could be anywhere from a backup to indexing to virus scanning and whatnot, and trying to find out what the limitations and the bottlenecks are across the entire ecosystem to find out when to run certain overhead and not impact production.

That’s one of the things that’s evolving. We are not there yet, but obviously we have to figure out how to get the data to the data protection pool. With virtualization, it just makes it a little bit more interesting.

Nowadays, we can bring up a fairly large-scale environment, like an entire data center, within a matter of months -- if not weeks. This is how long it would take us. The process from there moves toward how we facilitate setting up backup policies and schedules, and even that’s evolving.

Right now, we're looking at ideas and ways to automate that, so that' when a server plugs in, basically it’ll configure itself. We're not there yet, but we are looking at that. Some of the things that we’ve improved upon are how we build out quickly and then turn around and set up the configurations, as that business demand is then turned around and converted into backup demand, storage demand, and network demand. We’ve improved quite a bit on that front.

Being able to bring that backup success rate up is key. Some of the things that we’ve done with architecture and the product -- just the different ways for doing process -- has helped with that backup success rate.

The other thing that it's helped us do is that we’ve got a team now, which we didn’t have before, that’s just focused on analytics, looking at events before they become incidents.

With some of the evolving technologies and some of the things around cloud computing, at the end of the day, we'll still need to mitigate downtime, data loss, logical corruption, or anything that would jeopardize that business asset.

With cloud computing, if we're using the current technology today with peak base backup, we have to get the data copied over to a data protection pool. There still would be the same approach of trying to get that data. If there is anything to keep up with these emerging technologies, for example, maybe we approach data protection a little bit differently and spread the load out, so that it’s somewhat transparent.

Some of the things we need to see and we may start seeing in the industry are load management and how loads from different types of technologies talk to each other. I mentioned virtualization earlier. Some of the tools with content-awareness and indexing has overhead associated with it.