Resilience and Graceful Degradation: MSc Software Solutions Architecture Student Project

Benjamin Murray project capstone article image

Published: 13th March 2026

Traditional software resilience patterns usually have a binary outcome: the system either works or it fails fast. For Benjamin Murray, Software Engineering Manager at Smartbox Group, the future of dependable distributed systems lies in a third way – where systems automatically negotiate “good enough” responses to maintain stability under extreme pressure.

In the modern tech landscape, where even 30 seconds of downtime is considered a major outage, building 100% dependable systems remains an unsolved challenge. Benjamin Murray, who transitioned from an individual contributor to managing multiple teams at Smartbox, recognised that the most painful system failures often stem from complex interactions between individual services.

“Anyone working with large distributed systems knows they can be as precise and reliable as a Swiss watch one day and a confusing hall of mirrors the next,” says Benjamin. His capstone project for the MSc in Software Solutions Architecture aimed to address this by adding a new pattern to the “canon of dependability”.

TLDR; The next frontier of system resilience involves moving beyond binary fail-fast models. By integrating lightweight machine learning models into circuit breakers, architects can allow systems to gracefully degrade, maintaining high availability and throughput by delivering approximate responses when downstream services falter.

The Industry Problem: The High Cost of Perfection

As distributed systems become more complex, the industry has developed a human bias that everything must be perfect at all times. However, striving for absolute fidelity during a system crisis often leads to saturation, high latency, and eventual total failure.

Benjamin observed that while the industry focuses on big data, there is a massive opportunity in small data – specifically, using extremely lightweight models (approximately 0.01% the size of a modern LLM) to enhance dependability. These surrogate models can offer a solution that is “good enough” when the alternative is a total service outage.

Project in Practice: The Ground Truth Circuit Breaker

Benjamin’s research introduced the Ground Truth Circuit Breaker, a pattern where lightweight machine learning models are positioned as middleware in front of microservices.

The architecture works in two distinct phases:

  • Healthy State: The surrogate models act as sleeper agents, silently observing production traffic and learning to map inputs to outputs that approximate the real system’s “ground truth”.
  • System Under Duress: When a downstream service becomes unstable or exceeds latency thresholds, the circuit trips. Instead of simply failing, the surrogate models take over, handling traffic with approximate responses.
Above graph: The full takeover strategy immediately routes all requests to the surrogate once the threshold is crossed, yielding a clean division between ground-truth and surrogate handling. Partial takeover delegates only a subset of requests to the surrogate, blending both implementations within the critical latency region. The adaptive takeover strategy progressively shifts tra c away from the ground-truth implementation as latency rises, ultimately transitioning entirely to the surrogate as conditions worsen.

The Results: Stability Over Absolute Fidelity

To evaluate the success of this pattern, Benjamin utilised a TOPSIS analysis, a weighted decision model that compared different experimental configurations.

  • Accuracy: Service-local surrogate placement and adaptive takeover strategies achieved accuracy levels of 85-90%.
  • Throughput and Latency: By sacrificing 10-15% of response fidelity, the system significantly reduced end-to-end latency and increased throughput during crises.
  • Optimal Balance: The research found that several surrogate-based configurations outperformed traditional control baselines once the benefits of avoiding system saturation were considered.

Impact for Engineering, Product, and Business

The practical benefit of this research is a shift in how stakeholders view Service Level Agreements (SLAs). “Product teams often focus on how a system should behave under ideal conditions,” Benjamin notes. “But it is important to recognise that every system, no matter how resilient, will eventually fail.”

The Ground Truth Circuit Breaker allows for Graceful Degradation, similar to how Netflix might reduce video quality when a user’s connection slows down, rather than stopping the video entirely. This strategy helps maintain stability and customer trust even when the underlying infrastructure is struggling.

Left graph: A ranking of the TOPSIS analysis of the experimental runs with weights applied to latency, throughput, fidelity and cpu utilizatio. Right graph: Relationship between surrogate model activations and system latency at the API gateway. The top chart shows the P50, P95, and P99 request latency over time, while the bottom chart displays the number of surrogate activations for individual services.

Learning and Career: The Value of the “T-Shaped” Architect

For Benjamin, the MSc in Software Solutions Architecture was about more than just technical labs; it was about formalising concepts he had previously learned only in an ad hoc manner on the job. He emphasises the importance of architects being “T-shaped” – combining deep expertise in one area with broad knowledge across related domains.

“The decisions made at the architectural level can have a huge impact,” Benjamin explains. “They can create long-term advantages for the company or introduce technical problems that teams have to live with for years.” The Master’s programme equipped him to make these informed, deliberate decisions while leading his engineering teams at Smartbox.

About Benjamin Murray

Benjamin Murray is a Software Engineering Manager at Smartbox Group in Dublin, where he leads multiple teams in a fast-paced, collaborative engineering culture. His professional journey began in startup environments, where he gained hands-on experience across the entire software development lifecycle, from design through to production. At Smartbox, he transitioned from .NET Developer to a management role, where he now focuses on the long-term impact of architectural decisions and system dependability. Benjamin completed his MSc in Software Solutions Architecture in 2025.


Build the foundations of tomorrow’s tech solutions.

Drive innovation, shape strategies, and guide organisations toward a better digital future, with our award-winning MSc in Software Solutions Architecture.

Learn more about the MSc in Software Solutions Architecture

Abstract background