Why Good Architecture Still Fails in Production

At Devoxx UK in London, I caught up with Eoin Woods - co-author of three software architecture books and former CTO of Endava, to find out why beautifully designed systems still break in production.

Eoin, who now works independently across software architecture, green software, and engineering, has spent decades around large systems.

The failures he keeps seeing are rarely exotic. They begin with something mundane, an environmental detail nobody verified, a technology nobody fully understood, or a second order effect nobody expected until the system was already live.

That is why, for him, architecture is less about drawing clean systems on paper and more about surfacing the assumptions production will eventually expose.

Production is where the assumptions become visible

When I asked Eoin where failures come from, he resisted the temptation to separate elegant design from messy execution. He argued, the system that matters is the one that ships, and the one that ships is full of assumptions.

Ultimately the problem is always the design or actually, not even the design but what ends up in production…

This means that diagram may look reasonable, the architecture review may pass and the slide deck may even sound convincing. Production is where the assumptions become visible. Experience changes what people notice, Eoin put this bluntly:

The older and more cynical you get, the more questions you ask.

Seniority, at least in architecture, often means noticing the missing question before the missing answer turns into an incident. The same logic applies to coupled systems. A component can behave well on its own and still create trouble once the rest of the system starts reacting to it.

If one component slows down dramatically, the others (if tightly bound to it) probably slow down dramatically as well. It’s only when it actually happens that everyone realises everything else slows down in lockstep, because they hadn’t realised how coupled together they were.

That is the kind of failure teams usually understand only after the fact. The system behaves as designed, then fails as a whole because the interactions were never really understood as interactions.

Architecture is never just about structure

That distinction matters most where recovery is expensive. Financial infrastructure, healthcare, industrial control, and similar environments do not get the luxury of treating failure as a learning exercise.

We’re much less enthusiastic about moving fast, breaking things and fixing forwards actually because if we do break something in a mission critical system it has a really serious side effect.

In those environments, speed only helps when recovery is cheap. If rollback is hard, repair is slow, or the impact is irreversible, then fast delivery stops being a virtue and starts becoming a liability.

That is why architecture is never just about structure, but about consequences. A design that looks efficient in a presentation can become expensive in the real world the moment its assumptions meet a system that cannot absorb mistakes.

Writing the scenario forces the hidden question into the open

Eoin made the same argument in his Devoxx talk by describing a scene almost every engineer will recognise.

A stakeholder asks for something scalable, cost effective, secure, and easy to use. The architect comes back with containerised microservices, lower storage costs, forced password changes, and a task oriented interface. The stakeholder says it sounds good, while still not really understanding what was asked or what was decided.

That gap between what was asked and what was understood is exactly what architectural scenarios are designed to close.

A scenario takes a vague wish and turns it into something concrete. Not “the system should be scalable” but what happens when 5,000 users connect at the same time and the primary database fails at 7pm during peak load. Not “the app should be secure” but what happens when a decryption key needs to be recovered after an incident. The value of that exercise is not subtle.

There’s nothing like writing a scenario to reveal what you don’t know about your own system.

That is why scenarios matter, Eoin believes. They do not make uncertainty disappear, but they do force it into view.

He described six practical uses for scenarios. Teams use them to:

decide what to build
compare design options
drive research
assess design choices more broadly
explain existing system behaviour to people who need to understand it without getting lost in technical detail
surface the questions that should have been asked much earlier

The point is not documentation for its own sake though, but to make the system legible before production does the explanation instead.

If offline mode has only been demonstrated and never truly tested against the live system, the scenario is where that becomes obvious. If a recovery flow exists mainly in a vendor demo, the scenario turns the demo into a question rather than a conclusion.

The simplest useful habit is still discipline

Eoin’s most practical advice was also the least glamorous:

Make architectural decisions intentionally, and write them down.

This means recording the assumptions built into it, the trade offs it accepts, and the implications that follow.

ADRs, or architecture decision records, are not a new idea. Eoin said he was already writing them as a graduate in the 1990s, and even then they were not common practice. The fact that the advice is old does not make it less relevant. If anything, the opposite is true.

AI tooling is generating code faster than teams can inspect the assumptions that end up inside it. Requirements are still vague. Stakeholders still do not always understand the technical response and decisions are still too often left implicit.

The result is more software with more hidden assumptions inside it.

That is where architecture really begins, in the part of the work that makes those assumptions visible before production has to do it for you.

Why Good Architecture Still Fails in Production

Production is where the assumptions become visible

Architecture is never just about structure

Writing the scenario forces the hidden question into the open

The simplest useful habit is still discipline

CTOs Agree: Cognitive Debt Is the New Technical Debt

Engineering Leaders, You Should Be Worried If Your Team Isn’t Pushing Back

Don’t Hire Juniors to Write Code, Hire Them to Become Seniors