But it worked in testing…Sensitivity to Operations in the Real World

by Sandy Ludington

May 28, 2026

Banks often build their business on risk management. They borrow at low cost and issue credit at higher rates to make a profit – and the success of that strategy relies on the risk assessment the bank makes on each of its borrowers. They evaluate risk for the communities in which they operate. They monitor the risk associated with customers and transactions – for money laundering and fraud. They plan for some of those risks to materialize.

Big banks run sophisticated models for all these risks. They need to apply the same rigor to model development and deployment that any organization would apply to a complex engineering system. One such model, and a surprise in its deployment, sheds some light on the importance of validating and verifying requirements related to the intended operating environment – a challenge when system design and deployment cannot fully replicate that environment until the very end.

Imagine a bank model development lead, Ryan, applying a set of logic to the design of a risk model which must run once every week to assign a risk score to every customer in the entire bank. The model handles billions of pieces of data every time it executes, and complete reliability is expected by the bank’s leadership and by government regulators. Ryan supervises the model’s design and testing to be sure it matches the performance being promised to the bank’s leadership and to the regulator, but there’s a limitation in practice: Memory.

The bank runs all its models on resources provided by a cloud service. Each process operates on a “cluster” which is basically a computing environment for handling all the data and the processing defined for one of the models. Handling so much data means that the model needs the large memory available only with the biggest cluster offered by the cloud service provider, but that cluster is not available in the test environment. So, Ryan designs tests to prove the model executes successfully when broken up in pieces on smaller clusters, or when operating on only a portion of the data. He extrapolates results to satisfy himself that, once the biggest cluster is available in the production environment, the model will run.

After months of development, testing, explanation to the regulator, and internal approvals, the model is deployed in the production environment. It runs every Monday, and it runs well. Ryan’s team is justifiably proud of deploying a complex product successfully.

But then comes the surprise. After a few weeks of execution every Monday, the model starts to fail. The cloud provider sends cryptic error messages which Ryan cannot decipher. Ryan and his team go through heroics to find ways to run the model in parts to meet the regulatory expectation of a weekly run, but he’s at his wits end to find a way to resolve the unexplained failures.

Ultimately, he finds the answer after a few weeks of new tests, calls to the cloud service provider, and deep investigations. It turns out that the cloud service also makes those large clusters available to the world’s largest streaming services – and these users now claim all of the large clusters for themselves every Monday morning. As a result, the bank is in a race every Monday to grab access to the largest clusters. The flip of a coin decides whether the bank risk model gets it, or the streaming service gets it.

Ryan learned a tough lesson here – he made an assumption about the production environment without even knowing it. He had expected those large clusters to always be available and had designed the model’s deployment strategy around it. But assumptions amount to requirements and they need their own validation. In Ryan’s case, redesigning the model architecture after this failure cost human effort and compute time that took away from other projects – the team could have moved on to other challenges had the original design considered this limitation. On top of that, it cost him some trust and confidence with leadership and regulators.

Ryan’s lesson applies to all kinds of products – bank risk model validation in the production environment, aircraft whose final validation tests occur at altitude, power generation equipment whose first full integration is at project completion. In every case, the product team needs to get ahead of validation as much as they can early in the process – refusing to simplify, questioning assumptions, and anticipating failure modes so that final validation test is the time to confront only the known unknowns…and not unknown unknowns.

At Novellum Partners, we focus on strengthening product organizations (whether they field hardware systems or bank risk models) so they can get ahead of these challenges – sensitive to the operational environment so problems are solved before they’re experienced in production.