The UK’s TSB Bank was fined £48.65mn (AU$86mn) in December 2022 for operational resilience failings related to failures of a major digital transformation project. The headline and message is straightforward: regulators are serious about operational resilience. Beyond the headline, the Final Notices are a goldmine of learnings if you are undergoing a digital transformation, regardless of your sector or country.

In this blog we:

  • Provide a brief summary of what happened
  • Comment on the regulators’ message on operational resilience
  • Summarise key points and observations from the Final Notices

What happened at TSB Bank?

If you want the full run-down on what happened, I recommend you grab some popcorn and read the full Final Notices from the UK’s main financial regulators (FCA, PRA). Here’s a very quick summary:

  • TSB Bank planned a major IT migration project, the size and complexity of which was unprecedented in the UK
  • After delaying a publicly announced launch of late 2017, the primary migration event took place in April 2018
  • The platform immediately suffered serious issues that impacted customers due to failed services such as digital banking, branch technology failures, and inability to make payments
  • There were instances of data integrity issues, with some customers able to view data that did not belong to them, or being presented with incorrect account information or missing transactions
  • These primary failures resulted in a cascade of events – a huge influx of customer calls caused telephone systems to overload, with no technology or human capacity to serve them
  • TSB did not return to a ‘business as usual’ state until December 2018, following over 225,000 complaints and £32mn paid in redress to customers
  • The failure prompted the UK Parliament's Treasury Select Committee to ask the FCA questions about the adequacy of the regulatory regime's oversight

Regulators are taking operational resilience enforcement seriously

Before diving into the details of the report, let’s take a detour to the FCA press release. The headline is “TSB fined £48.65m for operational resilience failings.” Given that title, it’s noteworthy that TSB did not breach any of the operational resilience requirements that came into effect in March 2022. It couldn’t have – the primary event to which the fines relate happened in 2018.

While it only appears once in the FCA report, the phrase ‘operational resilience’ appears 26 times in the PRA report, including the following in the section on Breaches and Failings:

Although the PRA’s current, overarching operational resilience framework was introduced after the Relevant Period (specifically, in 2021), the PRA’s requirements and expectations as regards managing operational resilience consolidate many long standing and well understood areas of prudential regulation that have formed part of the PRA Rulebook for several years, including during the Relevant Period. These areas include governance, operational risk management, business continuity planning and the management of outsourced relationships.”

This appears to be an intentional message that the FCA and PRA won’t be sleeping on enforcement of the more recent operational resilience rules. While there is a deadline in March 2025 to meet impact tolerances, they’ve also been clear that failure to take interim action may be seen as a breach.

What do the notices tell us?

While the full reports are ripe with learnings, here are some highlights across major themes.

Program planning and timeline issues

  • The original migration plan was described as being “designed back from the y/end 2017 deadline” – rather than developed from start to end with dependencies understood.
  • TSB planned to achieve the entirety of the Migration Program in two years when even simply building a new platform in the UK in under three years was unprecedented. Did anyone question why they thought it was achievable? Were the plans or expectations benchmarked against existing practice?
  • In a September 2017 board meeting it was noted that the original target date was "deliberately very ambitious”, had acted as a "forcing mechanism" to ensure that the business and suppliers worked "at pace" but had been “based on very little information”. The FCA’s liberal use of parentheses points to a range of risk culture and project planning issues. We have to assume that some of the people involved in the project knew it was ambitious – which could also be reframed as ‘probably not achievable’!
  • When a replan was proposed in a deep dive meeting in October 2017, the board did not sufficiently challenge the plan – particularly to ensure that the aggressive assumptions of the original plan were not repeated.

 

Failures in risk management

  • 22 Program Risks were identified at the outset, but were not considered comprehensive, and remained unchanged throughout the project. In particular, risks associated with SABIS as a critical supplier were not assessed.
  • The Risk Oversight function was asked to conclude their activities before ‘go live’ so as not to distract from the migration, and to avoid raising new actions.
  • An open action related to role clarity between TSB’s IT function and SABIS was later considered ‘risk tolerated’ on the basis that they had “a minimum level of compliance that we can work with” – highlighting a compliance-based approach rather than outcome focused.
  • In the final week before the planned ‘go live’, none of the executives responsible had provided attestations on readiness that they were responsible for. The board did not challenge, at this late stage, whether it was reasonable that none had been complete.
  • The migration plan included 15 governing principles. A week before the planned go-live date, a paper was presented to the board outlining three principles related to testing that had not been adhered to. Instead of challenging, the board stated “it was important for the Executive to provide an overall assessment that the amount of testing was appropriate and reasonable”. Important questions were not raised; what are the potential outcomes given that those principles have not been followed? How does this change the risk profile?

 

Failures in testing

  • A number of assumptions were made about how long User Acceptance Testing would take, without any basis or validation of those assumptions – even when external consultants provided assessments of expected delays based on assessing historical testing performance.
  • A pre-production environment was originally planned, but it was later agreed that testing would take place in production. Internal Audit raised concerns, citing previous FCA enforcement action related to poorly planned and executed IT change management plans. The issue was risk accepted on the basis one would be obtained prior to migration – which never occurred.
  • Migration required two data centres configured by fourth parties. TSB did not audit the configuration. To avoid disrupting services, only one data centre was tested (successfully). This was considered reassuring on the basis that performance would be even better with two data centres. However cross-configuration errors that later caused customer detriment were not identified.
  • The above decision (among others) was not taken or escalated in accordance with the program’s governance structure or procedures. It was taken informally and not documented.

 

Failures in outsourcing

  • TSB did not identify explicit risks regarding SABIS, their most critical supplier. There was no assessment of the risk of non-performance or inadequate performance.
  • There was no formal due diligence to verify SABIS had the capability to deliver the required platform. An Internal Audit report noted that this may result in a breach of FCA Principle 3 (which TSB has now been fined for).
  • SABIS relied on 85 of their own third parties. By February 2018 (over 2 years into the project), TSB had not verified that SABIS’ supplier management model complied with the TSB Group Outsourcing policy.
  • TSB was meant to provide design documents to SABIS, so that it could then assess what SABIS had built. Instead, it was later agreed that SABIS would populate a Configuration Management Database. This would show what had been built, as opposed to design documents explaining what to build. They could not be used to verify that infrastructure had been built to an original design.
  • While SABIS and its fourth parties provided some attestations to TSB, they amounted to forward looking statements of good intention or expectations, rather than statements of fact about the completeness of readiness activities already undertaken.

 

Failures in business continuity

  • TSB did not plan for a crisis of this size because if it had considered there was a plausible risk of such problems occurring, it would not have gone ahead with the migration – highlighting a failure to consider scenarios that are severe but plausible.
  • TSB’s practice incident response events were designed for executives only. SABIS were essential to crisis response but had not participated in any exercises. It was decided that SABIS’ time would be better spent continuing to work on the project.
  • When TSB performed practice events, each of these simulations simply assumed that SABIS was actively working on the problems that arose in the scenario. TSB had limited assurance over SABIS’ BCP preparedness or ability to respond.
  • There was insufficient consideration on how to scale resourcing and bring in external resources on short notice; most plans consisted of re-allocating other internal resources, but these were not available as nearly every part of the organisation was affected.

 

What does the TSB Bank case mean for you?

We hope some of those observations have given you food for thought for your own business continuity, vendor management and risk management practices – or raise some uncomfortable questions you need to take back to your stakeholders.

Here is a summary of what can be gleaned from these points and the full reports:

  • Make sure that your project or transformation plans are based on realistic assumptions and are achievable
  • Enable a healthy risk culture, where challenge can be raised appropriately, and assumptions can be questioned and verified
  • Follow escalation paths to enable that challenge – you can’t challenge what you don’t see
  • Scrutinise your business continuity plans and make sure they will work in the real world
  • Consider cascading and second order effects in your business continuity scenarios
  • A program level risk assessment intended to cover all suppliers is not enough. You need to assess vendors based on the risk they represent individually, and conduct due diligence accordingly – before proceeding.
  • Understanding who your fourth parties (and the rest of your supply chain) is becoming increasingly important. While you can’t always access them directly, you should assess how your third parties are managing their own third parties.
  • While attestations can be useful, they are not a substitute for assessing the controls or capabilities of your supply chain

Next steps for your organisation

Protecht recently launched the Protecht.ERM Operational Resilience module, which
helps you identify and manage potential disruption so you can provide the critical
services your customers and community rely on.

Find out more about operational resilience and how Protecht.ERM can help:

 

Header image by Nikk. Some rights reserved (CC-BY 2.0)

Related Articles

feature image
Operational Resilience

Is Santa operationally resilient?

If there is one person whose operations need to be resilient, it’s Santa! St Nick might get the spotlight one night a year, but the Chief Operating...
Read more
feature image
Operational Resilience

Operational Resilience Series #8: Designing a good self-assessment process

You’re well on your way in implementing an operational resilience program; you have identified your important business services, defined impact...
Read more
feature image
Operational Resilience

Operational Resilience Series #7: What reporting do management want to see?

We’ve defined important business services, designed impact tolerances and mapped the processes and resources that support them. We’ve run through ...
Read more