Tuesday, October 26, 2021

Now’s the time for more industries to adopt a culture of operational resilience


I
n the last
BriefingsDirect sustainable business innovation discussion, we explored how operational resiliency has become a top priority in the increasingly interconnected financial services sector.

We now expand our focus to explore the best ways to anticipate, plan for, and swiftly implement the means for nearly any business to avoid disruption.

New techniques allow for rapid responses to many of the most pressing threats. By predefining root causes and implementing advance responses, many businesses can create a culture of safer and sustained operations.

Listen to the podcast. Find it on iTunes. Read a full transcript or download a copy.

To learn more about the many ways that businesses can reach a high level of assured business availability despite persistent threats, please welcome Steve Yon, Executive Director of the EY ServiceNow Practice, and Andrew Zarenski, Senior Manager and ServiceNow Innovation Leader at EY. The discussion is moderated by Dana Gardner, Principal Analyst at Interarbor Solution.

Here are some excerpts:

Gardner: Steve, our last chat explored how financial firms are adjusting to heightened threats and increased regulation by implementing operational resiliency plans and platforms. But with so many industries disrupted these days in so many ways, is there a need for a broader adoption of operational resiliency best practices?

Yon: Yes, Dana. Just as we discussed, the pandemic has widened people’s eyes -- not only in financial services but across other industries. And now, with hurricane season and those impacts, we’re continuing to see strong interest to improve operational resiliency capabilities within many firms. Being able to continuously serve clients is how the world works – and it’s not just about technology.

Gardner: What has EY done specifically to make operational resiliency a horizontal capability, if you will, that isn’t specific to any vertical industry?

Resilience solutions for all sectors

Yon: The platform we built the solution on is an integration and automation platform. We set it up in anticipation of, and with the full knowledge that it’s going to become a horizontal capability.

Yon
When you think about resiliency and doing work in operational models, it’s a verb-based system, right? How are you going to do it? How are you going to serve? How are you going to manage? How are you going to change, modify, and adjust to immediate recovery? All of those verbs are what make resiliency happen.

What differentiates one business sector from another aren’t those verbs. Those are immutable. It’s the nouns that change from sector to sector. So, focusing on all the same verbs, that same perspective we looked at within financial services, is equally as integratable when you think about telecommunications or power.

With financial services, the nouns might be things around trading and how you keep that capability always moving. Or payments. How do I keep those seems going? In an energy context, the nouns would be more about power distribution, capacity, and things like that.

With our solutions we want to ensure that you don’t close any doors by creating stove pipes -- because the nature of the interconnectedness of the world is not one of stove pipes. It’s one of huge cross-integration and horizontal integration. And when information and knowledge are set up in a system designed appropriately, it benefits whichever firm or whatever sector you’re in.

Gardner: You’ve created your platform and solution for complex, global companies. But does this operational resiliency capability also scale down? Should small- to medium-size businesses (SMBs) be thinking about this as well?

Yon: Yes. Any firm that cares about being able to operate in the event of potential disruptions, if that’s something meaningful to them, especially in the more highly regulated industries, then the expectation of resiliency needs to be there.

How to Build Resiliency into Operations

We’re seeing resiliency in the top five concerns for board-level folks. They need a solution that can scale up and down. You cannot take a science fair project and impact an industry nor provide value in the quick way these firms are looking for.

The idea is to be able to try it out and experiment. And when they figure out exactly how to calibrate the solution for their culture and level of complexity, then they can rinse, repeat, and replicate to scale it out. Your comment on being able to start small and grow large is absolutely true. It’s a guiding principle in any operational resiliency solution.

Gardner: It sounds like there are multiple adoption vectors, too. You might have a risk officer maturity level, or you might just have a new regulatory hurdle and that’s your on-ramp.

Are there a variety of different personas within organizations that should be thinking about how to begin that crawl, walk, run adoption for business continuity?

Yon: Yes. We think a proper solution should be persona-based. Am I talking to someone with responsibilities with risk, resilience, and compliance? Or am I talking to someone at the board level? Am I talking to a business service owner?

And the solution should also be inclusive of all the people who are remediating the problems on the operational side, and so unifying that entire perspective. That’s irrespective of how your firm may work. It focuses broadly on aligning the people who need to build things at the top level, to understanding the customer experience perspective, and to know what’s going on and how things are being remediated. Unifying with those operational folks is exceptionally important.

The capability to customize a view, if you will, for each of those personas -- irrespective of their titles – in a standard way so they are all able to view, monitor, and manage a disruption, or an avoidance of a disruption, is critical.

Gardner: Because the solution is built on a process and workflow platform, ServiceNow, which is highly integratable, it sounds like you can bring in third parties specific to many industries. How well does this solution augment an existing ecosystem of partners?

Yon: ServiceNow is a market-ubiquitous capability. When you look under the hood of most firms, you’ll find a workflow process capability there. With that comes the connectivity and framework by which you can have transparency into all the assets and actors.

ServiceNow is a market-ubiquitous capability. When you look under the hood of most firms, you'll find a workflow process capability there. With that comes the connectivity and framework to gain transparency into all the assets and actors.

What better platform to then develop a synthesis view of, “Hey, here’s where I’m now detecting the signal that could be something that’s a disruption”? That then allows you to be able to automatically light up a business continuity plan (BCP) and put it into action before a problem actually occurs.

We integrate not only with ServiceNow, but with any other system that can throw a signal -- whether it’s a facilities-based system, order management system, or a human resources system. That includes anything a firm defines as a critical business service, and all the actors and assets that participate in it, along with what state they need for it to be considered valid.

All of that needs to be ingested and synthesized to determine if there’s an issue that needs to be monitored and then a failover plan enacted.

Gardner: Andrew, please tell us about the core EY ServiceNow alliance operational resilience offering.

Detect disruptions with data

Zarenski
Zarenski: Corporations already have so many mitigation policies in place that understanding and responding to disruptions in real time is obviously essential. Everyone likes to think about the use case of plugging cybersecurity holes as soon as possible to prevent hackers from taking advantage of an exploit. That’s a relatively easy, relatable scenario. But think about a physical office service. For example, an elevator goes down that then prevents your employees from getting to their desks or people in a financial firm getting to their trading floor.

Understanding that disruption is just as important as understanding a cybersecurity threat or if someone has compromised one of your systems or processes. Detection today is generally harder than it’s been in the past because corporations’ physical and logical assets are so fragmented. They’re hard to track in that or any building.

Steve alluded to how service mapping, to understand what assets support services, is incredibly difficult. Detection has become very complicated, and the older ways of picking up the phone just isn’t enough because most corporations don’t know what the office is supporting. Having that concrete business service map and understanding that logical mapping of assets to services makes a solution such as this help our operators or chief risk officers (CROs) able to respond in near real time, which is the new industry standard.

Gardner: So, on one hand, it’s more difficult than ever. But the good news is that nowadays there’s so much more data available. There’s telemetry, edge computing, and sensors. So, while we have a tougher challenge to detect disruptions, we’re also getting some help from the technology side.

Zarenski: Yes, absolutely. And everyone thinks of this generally as just a technology exercise, but there’s so much more to it than the tech. There is the process. The key to enterprise resiliency is understanding what the services are both internally to employees as well as externally to the customers.

We find that most of our clients are just beginning to head down the journey of what we call business service mapping to identify and understand the critical services ahead of time. What are my five critical services? How can I build up those maps to show the quick wins and understand how can I be resilient today? How can I understand those sensors? What are the networks? What objects let me understand what a disruption is and have a dashboard show services that flip from green to red or yellow when something goes wrong?

There's so much signal out there to let you know what's going on. But to be bale to cut through and synthesize those material aspects of what's truly important is what makes this solution fit for duty and usable. And it does not take a lot of time to get done.

Yon: And, Dana, there’s so much signal out there to let you know what’s going on. But to be able to cut through and synthesize those material aspects of what’s truly important is what makes this solution fit for duty and usable. It’s not a big processing sync and does not take a lot of time to get done.

A business needs to know what to focus on, from what you imprint the system with to how you define your service map and how you calibrate what the signals represent. Those have to be the minimal number of things you want to ingest and synthesize to provide good, fast telemetry.  That’s where the value comes from, knowing how to define it best so the system works in a very fast and efficient way.

Gardner: Clearly, operational resiliency is not something you just buy in a box and deploy. There’s technology, business service mapping, and there’s also culture. Do you put in the technology and processes and then hope you develop a culture of resiliency? Or do you try to instill a culture of resiliency and then put in the ingredients? What’s the synergy between them?

Cultural shift from reactivity

Zarenski: There is synergy, for sure. Obviously, every corporation wants to have a culture of resilience. But at the same time, it’s hard to get there without the enabling technology. If you think about the solution that we at EY have developed, it takes resiliency beyond being just a reactive solution.

How to Build Resiliency into Operations

It’s easy for a corporation to understand the need for having a BCP or disaster recovery plan in place. That’s generally the first line of enabling a resilient culture. But bringing in another layer of technology that enables investment in the things that are listening for disruption? That is the next layer.

If you look at financial institutions, they all have different tools and processes that look at things like trade execution volume, and so forth. One person may have a system looking to see if trade execution volume has a significant blip and can then compare that to prior history. But to understand if that dip means something is wrong is not an easy process. Using EY’s operational resilience tool helps understand the patterns, catalog the patterns, and brings in technology that ultimately further enables that culture of resilience.

Yon: Yes, you want to know if something like that blip happens naturally or not. I liken this back to the days when we went through the evolution from quality control (QC)-oriented thinking to quality assurance (QA)-oriented thinking. QC lets you test stuff out, and lets you know what to do in the event of a failure. That’s what a BCP plan is all about -- when something happens, you pick up and follow the playbook. And there you go.

QA, which went through some significant headwinds, is about embedding that thought process into the very fabric of your planning and the design to enable the outcomes you really want. If there is QA, you can avoid disruptions.

And that’s exactly the same perspective we’re applying here. Let’s think about how continuity management and the BCP are put together. Yes, they exist, but you know what when you’re using them? You’re down. Value destruction is actually occurring.

So, think about this culture of resilience as analogous to the evolution to QA, which is, “Be more predictive and know what I’m going to be dealing with.” That is better than, “Test it out and know how to respond later.” I can actually get a heck of a lot better value and keep myself off the front page of the newspaper if I am more thoughtful in the first place.

That also goes back to the earlier point of how to accelerate time to value. That’s why Andrew was asking, “Hey, what are your five critical business services?” This is where we start off. Let’s pick one and find a way to make it work and get lasting value from that.

The best way to get people to change is quickly use data and show an outcome. That’s difficult to disagree with.

Gardner: Andrew, what are the key attributes of the EY ServiceNow resilience solution that helps get organizations past firefighting mode and more into a forward-looking, intelligent, and resilient culture?

React, respond, and reduce risk

Zarenski: The key is preventative and proactive decision support. Now, if you think about what preventative decision support means, the capability lets you build in thresholds for when a service maybe approaching a lag in its operational resilience. For example, server capacity may be decreasing for a web site that delivers an essential business service to external customers. As that capacity decreases, the service would begin to flash yellow as it approaches a service threshold. Therefore, someone can be intelligent and quickly do something about it.

But you can do that for virtually any service by setting policies in the database layer to understand what the specific thresholds are. Secondly, broad transparency and visibility is very important.

We’re expanding the usefulness of data for the chief risk officer (CRO). They can log into the dashboard two or three times a day, look at their 10 or 15 critical business services, and all the subservices that support them, and understand the health of each one individually. In an ideal situation, they log in in the morning and see everything as green, then they log in at lunchtime, and see half the stuff as yellow. Then they are going to go do something about it. But they don’t need to drill into the data to understand that something is wrong, they can simply see the service, see the approaching threshold, and – boom – they call the service owner and make sure they take care of it.

Yon: By the way, Andrew, they can also just pick up their phone if they get a pushed notification that’s something’s askew, too.

Zarenski: Yes, exactly. The major incident response is built into the backend. Of course, we’re proactively allowing the CROs and services owners to understand that something’s gone wrong. Then, by very simply drilling into that alert, they will understand immediately which assets are broken, know the 10 people responsible for those assets, and immediately get them on the phone. Or they can set up a group chat, get them paged, and any number of ways to get the problem taken care of.

The key is offering not just the visibility into what's gone wrong, but also the ability to react, respond, and have full traceability behind that response -- all in one platform. That really differentiates that solution from what else is in the market.

The key is offering not just the visibility into what’s gone wrong, but also the ability to react, respond, and have full traceability behind that response -- all in one platform. That really differentiates the solution from what else is in the market.

Gardner: It sounds like one of the key attributes is the user experience and interfaces that rapidly broaden the number of appropriate people and to get them involved.

Zarenski: You’re spot on. Another extremely important part is the direct log and record of what people did to help fix the problem. Regulations require recording what the disruption was, but also recording every single step and every person who interacted with the disruption. That can then be reported on in the future should they want to learn from it or should regulators and auditors come in. This solution provides that capability all in one place.

Yon: Such post-disruption forensics are very important for a lot of reasons.

Zarenski: Yes, exactly. A regulator will be able to look back and ask the question, “Did this firm act reasonably with respect to its responsibility?”

Easy question, but tough to answer. You would need to go back and recreate your version of what the truth was. This traps the truth. It traps the sequence, and it makes the forensics on answering that question very simple.

Gardner: While we’re talking about the payoffs when you do operational resiliency correctly, what else do you get?

Yon: I’ll give you a couple. One is we don’t have to get a 3 am phone call because something has broken because someone is already working on the issue.

Another benefit impacts the “pull-the-plug test,” where once a year or two we hold our breath to determine if our BCP plans are working and that we can recover. In that test, a long weekend is consumed with a Friday night fault or disconnection of something. And then we monitor the recovery and hope everything goes back to normal so we can resume business on the following Tuesday.

How to Build Resiliency into Operations

When we already understand what the critical business services are, we can quickly hone down essential causes and responses. When service orientation took hold, people bragged about how many services they had, perhaps as many as 900 services. Wow, that seems like a lot.

But are they all critical? Well, no, right? This solution allows you to materially keep what’s important in front of you so you can save money by not needing to drive the same level of focus across too wide of a beachfront.

Secondly, rather than force a test fault and pray, you can do simulations and tests in real time. “Do I think my resiliency strategy is working? Do I believe my resiliency machinery is fit for duty?” Well, now you can prove it, saying, “I know it is because I test this thing every quarter.”

You can frequently simulate all the different pieces, driving up the confidence with regulators, your leadership, and the auditors. That takes the nightmare out of your systems. These are but some of the other ancillary benefits that you get. They may seem intangible, but they’re very real. You can clean out unnecessary spend as well as unnecessary brand-impacting issues with the very people you need to prove your abilities to.

Gardner: Andrew, any other inputs on the different types of value you get when you do operational resiliency right?

Zarenski: If you do this right and set up your service mapping infrastructure correctly, we’ve had clients use this to do comparisons for how they might want to change their infrastructure. Having fully mapped out a digital twin of your business provides many more productivity and efficiency capabilities. That’s a prime example.

Gardner: Well, this year we’ve had many instances of how things can go very wrong -- from wildfires to floods, hurricanes, and problems with electric grids. As a timely use case, how would an organization in the throes of a natural disaster make use of this soluiton?

Prevent a data deep freeze

Zarenski: This specific use case stemmed from the deep freeze last winter in Dallas. It provides a real-life example. The same conditions can be translated over to hurricanes. Before the deep freeze hit back in the winter, we were adjusting signals from NOAA into the EY operational resiliency platform to understand and anticipate anomalies in temperatures in places that normally don’t see them.

We were able to run simulations in our platform for how some Dallas data centers were going to be hit by the deep freeze and how the power grid would be impacted. We could see every single physical asset being supported by that power grid and therefore understand how it might impact the business operations around the world.

There may be a server there that, in turn, supports servers in Hong Kong. Knowing that, we were able to prepare teams for a failover preemptively over to a data center in Chicago. That’s one example of how we can adjust data from multiple sources, tie that data to what the disruption may be, and be proactive about the response -- before that impact actually occurs.

Gardner: How broadly can these types of benefits go? What industries after power and energy should be considering these capabilities?

Yon: The most relevant ones are the regulated industries. So, finance, power, utilities, gas, and telecom. Those are the obvious ones. But other businesses need to ensure their firm is operational irrespective of whether it’s a regulatory expectation. The horizontal integration to offset disruption is still going to be important.

We’re also seeing interdependency across business sectors. So, talking to telecom, they’re like, “Yup, we need to be able to provide service. I want to be able to let people know when the service is going to go up when our power is down. But I have no visibility into what’s going on there.” So, sometimes the interdependencies cross sectors, cross industries and those are the things that are now starting to highlight.

Understanding where those dependencies on other industries are, can allow you to make better decisions on how you want to position yourself for what might be happening upstream so you can protect your downstream operations and clients.

It’s fascinating when we talk now about how each industry can gain transparency into the others, because there are clear interdependencies. Once that visibility happens, you’ll start to see firms and their ecosystem of suppliers leverage that transparency to their mutual benefit to reduce the impacts and the value disruption that may happen anywhere upstream.

Gardner: Andrew, how are organizations adopting this? Is it on a crawl-walk-run basis?

Map your service terrain

Zarenski: It all starts with identifying your critical services. And while that may seem simple at face value, it’s, in fact, not. By having such broad exposure in so many industries, we’ve developed initial service maps for what a financial institution is, or what an insurance institution looks like.

That head-start helps our clients gain a baseline to define their organizations from a service infrastructure standpoint. Once they have a baseline template, then they can map physical assets, along with the logical assets to those services.

Most organizations start with one or two critical services to prove out the use case. If you can prove out one or two, you can take that as a road show out to the rest of the organization. You’re basically setting yourself up for success because you’ve proven that it works.

Yon: This goes back to the earlier point about scale. You can put something together in a simple way, calibrating to what service you want to clear as resilient. And by calibrating what that service map looks like, you can optimize the spread of the service map, the coverage it provides, and the signals that it ingests. By doing so, you can synthesize its state right away and make very important decisions.

The cool thing about where the technology is now, we’re able to rapidly take advantage of that. You can create a service map and tomorrow you can add to it. It can evolve quickly over time.

How to Build Resiliency into Operations

You can have a simplistic view of what a service looks like internally and track that to see the nature of where faults enter the system and predict what might materialize in that service map, to see how that evolves with a different signal or an integration to another source system.

These organizations can gain continuous improvement, ensuring that they consistently raise the probability of avoiding disruptions. They can say, “I’m now resilient to the following types of faults,” and tick down that list. The business can make economic choices in terms of how complex it wants to build itself out to be able to answer the question, “Am I acting in a reasonable way for my shareholders, my employees, and for the industry? I’m not going to cause any systemic problems.”

Gardner: You know, there’s an additional pay back to focusing on resiliency that we haven’t delved into, and it gets back to the notion of culture. If you align multiple parts of your organization around the goal of resiliency, it forces people to work across siloes that they might not have easily forded in the past.

So, as we focus on a high-level objective like resilience, does that foster a broader culture of cooperation for different parts of the organization?

Responsible resiliency collaboration

Yon: It definitely does. Resiliency is becoming a sound engineering principle generally. It can be implemented in many different ways. It can be implemented not only with technology, but with product, people, machinery, and governance.

A lot of this rolls up with being compliant to different regulations. We're providing a capability for virtually anyone to support risk and compliance activities -- without even knowing that you're supporting risk and compliance activities. It makes compliance easy to understand.

So many different people participate in the construction of an architectural capability like resiliency that it almost demands that collaboration occur. You can’t just do it from a silo. IT just can’t do this on their own. The compliance people can’t do this on their own. It’s not only a horizontal integration across the systems and the signals for which you detect where things are -- but it’s an integration of collaboration itself across those responsibility areas and the people who make it so.

Gardner: Andrew, what in the way the product is designed and used helps facilitate more cultural cooperation and collaboration?

Zarenski: Providing a capability for everyone to understand what’s going on is so important. For me to see that something going wrong in my business may impact someone else’s business gives a sense of shared responsibility. It gives you ownership in understanding the impacts across all the organizations.

Secondly, a lot of this all rolls up to being compliant in different regulations. We’re providing a capability for virtually anyone to support risk and compliance activities -- without even knowing that you’re supporting risk and compliance activities. It makes the job of compliance visual and easy to understand. That ultimately supports the downstream processes that your risk and compliance officers must perform -- but it also impacts and benefits the frontline workers. I think it gives everyone an important role in resiliency without them even knowing it.

Gardner: How do I start the process of getting this capability on-boarded in my company regardless of my persona?

Yon: The quick answer is to turn on the news. Resiliency and continual operation awareness are now at the board level. It’s one of the top-five priorities firms say are important for them to survive through the next 10 years.

Witness all the different things that are being thrown at us all -- whether it’s weather, geopolitical, and pandemic-related. The awareness is there. The interest is definitely there. Then the demand comes from that interest.

Based on the feedback and conversations were having with so many clients across so many industries, it is resonating with them. It’s now obvious that this needs to be looked at because turning your digital storefront off is no longer an option. We’ve had too many people see the impact of that over the past year.

And the nature of disruptions just keeps getting more complex. We’ve had near-death business experiences. They’ve had the wake-up call, and that was enough of a motivation to have awareness and interests in it that’s now moving us toward how to best fulfill it.

Gardner: A nice thing about our three-part series is we first focused on the critical timing around the financial industry. We’re talking more specifically today about the solution itself and its wider applicability.

The third part of our series will share the experiences of actual customers and explore how they went about the journey of getting that germ of operational resilience planted and then growing it within their company. Meanwhile, where can our audience go for more information and to learn more about how to make operation resiliency a culture, a technology, and a company-wide capability?

Yon: For those folks who already have responsibilities in this area, their industry trade shows, conversations, and dialogues are actively covering these issues. Second, for those who are EY or ServiceNow customers, talk to your team because they can lead you back to folks like Andrew and myself to confer about more specifics based on where you are on your journey.

Listen to the podcast. Find it on iTunes. Read a full transcript or download a copy. Sponsor: ServiceNow and EY.

You may also be interested in:

No comments:

Post a Comment