Thursday, April 28, 2011

Master IT support providers Chris and Greg Tinker's take on how integrated technical support is essential in a complex, cloudy world

Listen to the podcast. Find it on iTunes/iPod and Podcast.com. Read a full transcript or download a copy. View the blog.

As recent outages at Amazon Web Services and Sony PlayStation Network jar the common perception of IT business as usual, IT failures and performance snafus are nothing new, just perhaps much more prominent.

Someone, somewhere got the first call on those outages -- the front line IT technical support staff. And the expanding role of cloud and the online services ecosystems that more of us depend on only point up why such IT technical support is more important than ever.

It just so happens that the importance of good and fast support is forcing technical support industry changes, with an emphasis on integration and empowerment for improving how help desks respond and perform in a spiraling crisis.

To learn more about how support is adapting to the high-impact, high-exposure cloud era, BriefingsDirect recently interviewed two lauded IT Master Technologists from HP. Part of the new support philosophy comes from providing a more centralized, efficient, and powerful means of getting all the systems involved working, and all the knowledge necessary together quickly to get applications back in action and keep them there. [Disclosure: HP is a sponsor of BriefingsDirect podcasts.]

These two support stars, Chris Tinker and Greg Tinker, both HP Master Technologists, who happen to be identical twins, were chosen via a recent sweepstakes hosted by HP to identify favorite customer support personnel. Learn here why they gained such recognition, and uncover their recommendations for how IT support should be done better in a rapidly changing era of increasingly hybrid and cloud-modeled computing. The two were interviewed by BriefingsDirect's Dana Gardner, Principal Analyst at Interarbor Solutions.

Here are some excerpts:
Gardner: You deal with people when they are, in some cases, their darkest hour. They're under pressure. There's something that's gone wrong. They're calling you. So, you're not just there in a technical sense, which of course is important, but there must be a human dynamic to this as well. How does that work?

Chris Tinker: We become their confidant. We foster a relationship there between the two parties. For us, it's very exhilarating. It's the ultimate test. You want to build both the technical and business, but also the interpersonal relationship, because you have to weigh in on so many levels, not just technical. That’s a critical component, but not the only component.

Greg Tinker: And today the customer expects the technical master technologist, like my brother and I, not just to know the one thing they're asking about, because that question is going to quickly turn. For example, I am having an Oracle performance issue, the customer thinks it may be disk related, but when you dig into it, you find out that it's actually an ODBC call, a networking issue. So, you have to be quite proficient at a multitude of technologies and have a lot of depth and breadth.

Gardner: So what does it take to be a good IT support person nowadays?

Chris Tinker: It’s simply not enough to be a technical guru -- not in today's industry. You have to have a good understanding of technology, yes, but you also have to understand the tools and realize that technology is simply a tool for business outcomes. If you're listening to the business, understanding what their concerns and their challenges are, then you can apply that understanding to their technical situation to essentially work for a solution.

Greg Tinker: Chris and I study, almost on a daily basis, to stay ahead of the technology curve. Chris and I both do a lot in SCSI I/O control logic, with respect to the kernel structure of HP-UX as well as Linux, which is our playground, if you will.

And, it takes what I would call firm foundation to be able to provide that strong wealth of knowledge to be the customer's confidant. You can't be an expert at one point anymore. You can't be a network expert only. You have to understand the entire gamut of the business, so that you can understand the customer's technical problem.

Gardner: Let me congratulate you your award. This was I think a worldwide pool, or at least a very large group of people that you were chosen from. Did this come as a surprise?

Greg Tinker: It was an honor, I can say that, and we are very grateful for that. Our customer installed base, as well as our peers and the management team, put our names into this situation. It was a great honor. ... For each vote that was cast, HP donated $10 to the humanitarian organization Care, to max out at a $100,000. They met that goal in just a few days. It was quite astonishing.

Chris Tinker: And it was a surprise. ... Very rewarding.

Gardner: Okay, you've been at this for 12 and 13 years. What's changed over that period of time?

Chris Tinker: Catchphrases change. Today it's cloud computing, but cloud computing has been around for a long time. We just didn’t refer to it as cloud computing. Shared infrastructure of course is what we called it.

Virtualization today is becoming a big ticket item, where in years past, big iron was the thing that was a catchphrase. Big iron was very large computers. We still have big iron in storage, that’s true. We still have that big footprint, big powerhouse, that consumes a lot of power, but that’s a necessity of the storage platform.

The big thing for today is converged infrastructure. These are terms you wouldn’t have heard years ago, where we are trying to converge multiple type of protocols, physical media under one medium, networking, Fibre Channel, which of course is your storage network, TCP/IP network, going across the same physical piece of media. These are things that are changing, and of course with that comes extreme amount of complexity, especially when it comes into the actual engine that drives this.

Greg Tinker: As Chris stated, the key phrase of yesteryear was big iron. I want a big behemoth machine that can outdo mainframe. If you look back to 1999 and 2000, what you were looking for in the open system world was something to compete with Big Blue.

Today it's virtualization and blades. Everybody used to say -- probably about mid-2005 -- "I want a pizza box. I want a new blade." We no longer call those blades. Those are called pizza boxes now. Today, the concept is all about blades. If you can't make the thing 3 inches tall and 1 inch wide, there is something wrong.

Gardner: You've been describing how things have changed technically. How have things changed in terms of the customer requirements and/or the customer culture?

Chris Tinker: The expectation is more for less. They want more computing power. They want more IT for less cost, which I think that’s been true since day one, but today, of course, that "more for less" just means more computing power. The footprint of the servers has changed.

And two, the support model has changed. Keep in mind, we're in support, and we're seeing a trend with these concepts where customers are having all these physical servers and the support contracts on all these servers are being consolidated down to one physical server with virtual instances.

The support model of yesteryear doesn’t always fit the support model that they should have today.



The support model of yesteryear doesn’t always fit the support model that they should have today.

Greg Tinker: What Chris is talking about there is consolidation efforts. Customers used to have 500 servers. Today, -- I want to exaggerate my point here -- we have it on a virtualization of one or two physical machines that are behemoth and it's virtualized 500 guests.

Though that model works right for consolidating the cost effort of the infrastructure, so your capital cost is less, the problem now becomes the support model. Customers tend to reduce the support as well, because it's less infrastructure. But, keep in mind, most customers kind of forget a lot of times that they've put all their eggs into the one basket, and that basket needs a lot of protection.

So now you have your entire enterprise running on one or two pieces of physical hardware that is a grossly complex with not only the virtual servers, but the virtual Ethernet modules, the Fibre Channel model concepts are all now basically one concept to run every protocol type, whether you are running infiniband, Gigabit Ethernet, Fibre Channel, etc., the complexity requires a great deal of support.

When a customer calls up and says, "We've made a change in our environment and my server has crashed, the physical server went down, or has lost access to its storage or network," you're not just affecting that one physical server, but you're affecting hundreds. So, the support model today is quick.

Gardner: It sounds to me that there is a higher risk profile. Is that a fair characterization?

Hardware redundancy

Greg Tinker: That would be a fair characterization. There is a higher risk on the hardware end in the sense that you still have hardware redundancy, of course, but you're fully dependent upon cluster technology and complexity.

Chris Tinker: A good solution design for business risk assessments are still a critical component to your solution design.

Gardner: I'm going to guess that over the past several years in the tradeoff for cost and risk, people probably favor the cost side a bit. So, that means the people in your position are the backstop?

The new light today is that customers are focused more on the higher end support models, meaning four-hour call to repair.



Greg Tinker: That’s what the trend is becoming. The trend is, "We're going to reduce our cost in the CAPEX and reduce our cost in the infrastructure. We're going to consolidate and virtualize that concept, and we are going to look at our support strategy in a different light." That’s what most customers think.

Gardner: What is that new light?

Greg Tinker: The new light today is that customers are focused more on the higher end support models, meaning four-hour call to repair, where it used to be 24-hour or 48-hour support models, where we were not in a huge rush. If we had a disk drive failure, we had plenty of time, because we had full redundancy, whatever. So we had plenty of time to fix those components.

Today, with all this consolidation effort, it becomes a real critical need when you have a failing component, whether it be hardware or software, to get that component addressed urgently. You don’t really have the time.

Chris Tinker: That’s a great point. Looking at that standard support model, you had so many physical servers and your business was essentially interlaced with these systems. You could handle an outage, whether software or hardware condition. It wasn't as strategic or as strong as today’s virtualized environments, where you would have much heavier business impact.

To Greg’s point, this inter-support model used to work with some of these virtualized environments. I am not saying all virtualized environments, but some of these virtualized environments. With four-hour call-to-repair, you can imagine in four hours what’s required. The technologists who answer the phone first have to address the business concerns to figure out what the business impact is and understand what the problem is.

Once we ascertain what’s causing that problem and the problem has been defined, we have to figure out what’s going wrong with the technology in order to bring it back online. All that has to be done within four hours on some of our most critical contracts.

Gardner: You're sorting through implementations with loads of vendors involved. When it comes to this sort of a mission-critical situation, they're probably thankful that there's someone there trying to corral this. So, I imagine the cooperation is pretty high in these circumstances?

Stakes are high

Chris Tinker: Yeah, the stakes are high at this level. You are talking about, not only the corporation, the customer, but you are also talking about the vendors, whether it be HP or third party, and we are partnering with all these vendors. Everybody has got a stake in the game. Essentially, their reputation is on the line.

So we partner, regardless. As we don’t want to be thrown under the bus, we don’t throw anybody else under the bus. We partner. We come together as one throat to choke or one hand to shake, however you want to look at it. But, essentially, we all have the same thing in common, the customer’s well being.

Greg Tinker: I'll second Chris’ sentiment on that, in the sense that when we're engaged at our level, it's no longer a finger-pointing game. It's a partnership, regardless of who the customer is. If it's HP gear, so be it. If it's somebody else’s gear, and we see where the problem is at, we don't point the finger. We ask the customer to get their vendor on the bridge with us and we work as a team to get the business restored, because that’s priority one.

Chris Tinker: That’s HP technical support. That’s what we thrive at. That’s one of our charters. Our management has dictated that they want team effort, global effort.

Gardner:
How did you both get involved with this? Did one get into it first and the other follow? What's the story behind how you ended up here?

Lengthy road

Greg Tinker: It was quite a lengthy road. Chris and I actually started off going in one direction, and we agreed many years ago in school that one of us would go one direction and the other in another, and see who was enjoying the industry better. Chris joined HP and fell in love with it. He and I have a very strong Linux background. Then, I jumped ship and went with my brother Chris, and we have been with HP ever since, and have loved it dearly.

Chris Tinker: We look at IT support as a ladder and we just climbed that ladder. We started in mission-critical support and found it to be exhilarating. With mission-critical support you're talking about enterprise-class corporations. We're not talking about consumer products. We're talking about an entire corporation's business running on an IT solution and how we're engaged in that process.

Unfortunately, in our line of work, we do see customers, where the technology did not go as planned, predicted, or expected and it's up to us to essentially figure out what the expectations are with technology and ascertain whether or not the technology can deliver that. That's how we moved through support.

We started off as mission-critical support specialists. We became architects, designing solutions for corporations and found out that we were very good at escalations and that's where we are today.

Gardner: There have also been some shifts over the past dozen years or so in the degree to which remote support is possible and your ability to get inside and get that information. Maybe we could take a moment to learn more about what tools have been brought to bear to help you with this?

HP virtual room

Chris Tinker: The HP Virtual Room (HPVR). If you go to rooms.hp.com, it’s a good example. As you just mentioned, yesteryear it was, "Hey, send me the logs. Send me the examples. Send me some data, and I'll parse through it and figure it out." You had to wait for data to come in and then start parsing those logs, parsing that data, and building your hypothesis of what might be the problem.

Now, imagine if I were able to take that in real time. So, Greg, talk about real time.

Greg Tinker: Real time is key in today’s technology world. Nobody wants to wait. Take your phone for example. Can you stand it when you have pressed the email button and your phone takes more than three seconds to load it up? Everybody gets annoyed when it's slow. Well, the same is true in technology services support.

When customers call in, they expect immediate response. By the time it gets to our level, where Chris and I sit and our team resides inside the support model, the customer is in dire straits. We use the Virtual Room technology. It's similar to WebEx.

There are a lot of similarities out there. Different vendors have different tools. We use the HP Virtual Room toolset and we can jump onto any machine in the world, anywhere in the world, at a moment’s notice. We can do crash analysis on a Linux kernel crash in real time on a customer’s machine. The same with HP-UX, Solaris, AIX, name your favorite.

We can look at these stack traces and actually find the most likely component that compromises the infrastructure. We can find it, isolate it, and remedy it.



We can look at these stack traces and actually find the most likely component that compromises the infrastructure. We can find it, isolate it, and remedy it.

Chris Tinker: Not only is it just us troubleshooting, but it's bringing to bear our peers. It's team work, a two-heads-are-better-than-one mentality. Greg even lived that first. At the end of the day, you've got 2, 4, or 20 people on the phone. You can imagine all of those people sharing the same desktop at the same time to try to look at a problem. You get all these different levels of expertise.

You're able to take all these talents and focus them on one scenario. So, now with four-hour call to repair, how is that even possible? It's possible when we have to bring these people and partner with these people. They could be not only HP employees and HP technical support. That goes back to vendors and those relationships. We bring those vendors into the same Virtual Room, showing them where we're seeing the problem and asking what we need to do to solve this.

Gardner: While we are on the subject of tools, what's coming next? If I were to design these types of tools, you would be the guys I would go to, to get my list of requirements. What are you asking for?

Greg Tinker: The biggest thing we see today is storage. The growth rate of storage is enormous. And the biggest problems customers run into are performance and capacity.

Capacity is the easy one, right? I am 100 percent full in my file system. I just need more. That's the easy one to fix.

The hard one to fix is "My application is not running the way I want it to, Fix it." Those are the difficult ones. We have to have a lot of tools to help us understand what the load conditions are, because it's no longer the yesteryear scenario of a Superdome, HP Rack, one big behemoth machine, four terabytes of memory, 400 CPUs, loading up one storage array. That's no longer the case.

We have grid computing structures of 600+ nodes running a multitude of different things -- SAP, Oracle, Informix, Exchange, etc. All of these different load-bearing concepts are coming into one monolithic storage array. It can become quite daunting to understand what's causing that load condition, and we have a lot of tools today that are helping us ascertain the root of those problems faster.

Chris Tinker: We have become the bleeding edge of technology. Essentially, it's software that hasn't been released. It's tools which are not actually production ready, and we use these tools as well, and some tools we can’t even speak about.

Business realities

B
ut, these are tools that will be in the enterprise eventually. They will be out in the world eventually. You asked earlier what we see coming down the road? Imagination is essentially one of the only things in technology. In today's world, there are other factors of course. Business realities temper the development of technology, but it's going to be very exciting to see what technology is being developed and what's coming next.

Gardner: I wonder if you might have just some last advice for those listening to the podcast as to how they on the consumption side might help folks like you on the services and support delivery side do your job better? What advice do you have for them in order to have a better outcome?

Chris Tinker: Yeah, it's being able to articulate the actual problem at hand, and the challenge that you have with your technology, because keep in mind that technology, IT, is nothing more than a tool that allows us to have business outcomes. So it's nothing more than a tool that the business utilizes for their requirements.

Then, to have metrics around their environment. They have to have a baseline. They have to have an understanding of what the technology has been doing.

Trending is key

Greg Tinker: Trending is key in a lot of these new virtualized consolidated environments. You need to have a baseline, as Chris stated. We need to have the performance characteristics. Your logging and ESX is about as common as sliced bread in a grocery store. ESX environments are very common and thought of very highly. I enjoy them. They are very nice.

Customers tend to start moving towards ESXi, which is fine, but ESXi doesn't log. It does log but you only get like a two hour history. The point is that customers take that logging for granted. You have to have your logging enabled and you must keep at least a six month trend.

So you don't keep all your logs and your service forever, but a six month trend is very helpful when you have a mysterious problem show up. Then, we can compare yesterday to today and see what differences have shown up in the environment.

Gardner: It comes down to data, having the data at your disposal.

Chris Tinker: Not just data, but having a baseline. We get a lot of calls where customers have no idea of what the environment was doing before. They say, "We're having a problem now. Our users are complaining." We ask, "How did it used to run? How long did this job used to take? Did it use to take 2 hours, and now it takes 20 hours?" A lot of times, they simply do not know.

I wish customers would yield to knowing that logging is critical. You don't have to keep it forever, but keep it for a strategic period of time. Six months is a good number.
Listen to the podcast. Find it on iTunes/iPod and Podcast.com. Read a full transcript or download a copy. View the blog.

You may also be interested in: