Skip to content

Operational Resilience Series #1: What is operational resilience?

One of the key concepts getting serious airplay on the current risk management stage is operational resilience. It is a key focus of financial services regulators, and an increasing focus of governments worldwide with respect to critical infrastructure. What is operational resilience and how should you go about ensuring your operational resilience capability is up to scratch?

What is operational resilience?

Resilience derives from the Latin word “resilire” which means to recoil or rebound. The Oxford English Dictionary defines resilience as “the capacity to recover quickly from difficulties; toughness”. While the OED definition focuses on the ability to recover quickly, resilience is also the ability to withstand adversity in the first place. 

Operational resilience for organizations therefore means:

  1. To withstand adversity. This covers two elements:
    1. To position the organization to not be affected by disruptive events, e.g. locate in a non-earthquake zone
    2. Be able to withstand a disruptive event so that damage is minimized, e.g. using an earthquake proof building if you have to be in an earthquake zone
  2. To be able to recover quickly and successfully from adversity
  3. To be able to pivot the organization quickly and permanently if the adversity or disruptive events create a new normal
  4. To be able to learn from the disruptive events and make the organization stronger

Operational resilience is not new. There are many aspects of risk management that focus on resilience, particularly Business Continuity and Disaster Recovery. These disciplines play a major role in operational resilience but on their own they are not all of operational resilience. Operational resilience comprises of a range of activities, processes and capabilities.

What should an operational resilience process look like?

So, moving from the why and what to the how, what does a strong operational resilience capability look like and where do you start?

1.      Stakeholders and objectives

The starting point is to define what are your organization’s ultimate outputs that need to be resilient. This will be defined by identifying your key stakeholders and the value and service you bring to them. This provides the ultimate objectives of operational resilience.

Examples may be:

  • To provide payment services to customers to allow customers to buy, sell, borrow and invest
  • To provide stability of the banking system
  • To provide utility services such as electricity, gas or water
  • To provide telecoms services

2.      Important Business Services

The second step is to identify the Important Business Services (IBS) that are required to deliver your key services to your stakeholders. This will be an end-to-end process. We need to map these processes so that we understand the sub processes and critical resources needed for their successful operation. For example, for a payments provider, this IBS will be the complete end-to-end payments process.

3.      Impact tolerances

You need to set impact tolerances over the negative impacts you may bring to the key stakeholders. We need to determine how the objectives noted in the first stem can be measured. For payment services, the ultimate impact on the customer may be such things as financial hardship or quality of life. A measurable proxy for this may be the level of availability of the service.

Once the objectives measurement has been determined, maximum impact tolerances need to be set at a point where if they are exceeded, an unacceptable level of damage occurs to the stakeholder. For the payments service, this may be the maximum acceptable outage.

4.      Sub-processes

The next step is to identify the various sub-processes that make up the IBS. For the payments service, a sub-process could be Merchant Switching.

5.      Critical resources

For each sub-process we then need to identify and map the critical resources (e.g. People, Physical Assets, Technology Asset) required to operate that process, and therefore the associated Important Business Service.

6.      Resource health

Once the critical resources are known, we need to be able to assess the health of each resource in terms of its ability to withstand stress (prevention) and also the ability to recover from stress (cure).

At this stage we have a complete understanding of the end-to-end process which needs to be operating effectively in order to deliver the identified objectives.

7.      Scenarios and simulations

We then need to consider a range of severe but plausible, disruptive scenarios. These may include such things as natural disasters, pandemics, social unrest, conflict, terrorism etc.

We then need to understand what will happen to our IBS if the identified disruptive scenarios were to occur. This requires running simulations, usually by way of desktop simulations, of the selected scenarios and assessing the impact on the IBS, the objectives and ultimately on your stakeholders. The test results then need to be evaluated against the Impact Tolerances.

8.      Learnings and resilience improvements

Where the scenarios are outside of tolerance, we need to analyze and identify where there are weaknesses, vulnerabilities, single points of failure etc. Any issues should then be identified leading to actions to resolve.

9.      Reporting and Accountability

Finally, we need to consider the information that needs to be produced and reported around the above steps. This will include external reporting to regulators and other third parties and internally to Board, executive management and other interested parties.

About this series

This article is a first of a series to be published over the coming months which will dive deeper into the elements above. We will add links to this article as new articles in the series are published:

 

Protecht's Complete Guide to Achieving Operational Resilience eBook gives you a detailed look at Operational Resilience, to learn exactly what makes it different from Disaster Recovery and Business Continuity and to get a list of steps to help you develop your own Operational Resilience capability. Find out more and download it now.

About the author

David Tattam is the Chief Research and Content Officer and co-founder of the Protecht Group. David’s vision is the redefine the way the world thinks about risk and to develop risk management to its rightful place as being a key driver of value creation in each of Protecht’s clients. David is the driving force in driving Protecht’s risk thinking to the frontiers of what is possible in risk management and to support the uplift of people risk capability through training and content.