Trisha Gee: AI Won’t Fix Your Broken Pipeline – It Will Break It Faster

At Devoxx UK, I spoke with Trisha Gee - author and one of the most recognized voices in the Java space - about what really happens when teams lean heavily on AI. Her take was far darker than the conference hype.

At Devoxx UK, I spoke with Trisha Gee – author and one of the most recognized voices in the Java space – about what really happens when teams lean heavily on AI. Her take was far darker than the conference hype.

Trisha Gee has spent over two decades in software development, from startups to global enterprises – equally at home discussing DORA metrics and SPACE frameworks as business outcomes and organizational design.

At Devoxx UK, she gave a talk about how software engineering principles stay the same regardless of what tooling era you are in.

I wanted to understand what that means right now when AI is writing a significant portion of the code.

AI exposes the weakest link, not just the fastest path

Trisha frames AI as an amplifier, not a solution. When I asked what that looks like beyond demos, she put it simply: it exposes the problems that were already there, the ones you didn’t know you had.

The most common thing I saw (I was working at Gradle, so we dealt with a lot of build tooling) was more code, more tests, and tests taking longer. The continuous delivery pipeline took a lot of pressure.

The broader pattern she describes is straightforward but easy to miss when you are excited about shipping faster. “Whichever part of your system is the weakest, it’s going to expose that part,” she said.

Reframing it this way, while most conversations about AI adoption focus on what gets faster, Trisha highlights what deteriorates first.

When code gets cheap, everything else gets expensive

When I asked Trisha where teams should focus once code generation becomes cheap, her answer was everywhere.

What she means is that optimizing the writing of code without understanding the surrounding system does not move the needle.

It’s not about one thing which is going to fix one problem, it’s about really understanding the whole system, it’s about understanding even the whole organization, the whole enterprise. Where does IT and technology and software fit into that? What are you really trying to deliver? What is the business benefit?

She described this as working across two ends of the process. On the input side, teams need to get better at questioning requirements before writing anything. On the output side, they need to look at build pipelines, test parallelism, flaky tests, and DORA metrics.

“If you can measure those things (your DORA metrics, build times, whether delivered requirements actually give users value) you can start to see which parts of the process are working and which need attention,” Trisha explained.

Measuring the wrong things optimizes the wrong things

She also makes a sharp point about measurement and optimization.

If you measure lines of code for productivity, you’ll get more lines of code. But really productivity is not just about what we call these activity metrics. It’s not just lines of code. It’s not just pull requests, merges, features delivered.

The thing teams consistently miss is the full arc of delivery.

Developer experience and productivity is the whole piece. Did it get out to the user? Did it meet the user’s needs? Is the user paying for more of our stuff? Is the business getting what they need from what the developers are doing? What you’re measuring there impacts what you’re going to optimize.

That last line is worth sitting with. If your productivity metrics stop at pull requests merged, you are optimizing for pull requests merged.

The SPACE framework and why three metrics beat one

When I asked Trisha what teams should measure, she pointed to the SPACE framework. SPACE stands for satisfaction, performance, activity, communication and collaboration, and efficiency and flow.

DORA metrics, which most teams are more familiar with, are a subset of it. Her recommendation is to pick metrics from three different dimensions rather than relying on a single category. The reasoning is that single-category metrics tend to be easy to game without improving anything real.

So yes, you can write more code, but no, you didn’t do what the business wanted.

She also brought up Fred Brooks and communication overhead as something the industry consistently underweights. The harder metrics to capture, like satisfaction and flow, are often more revealing than the activity metrics that dashboards make easy to track.

The business outcomes she keeps returning to are specific: “You need to measure, did it do what you wanted it to do? Did it get out to the user in time? Did they start spending more money with us? Did it fix your retention problem?”

Those are the things which matter much more to the business.

What to fix before adopting AI

I wondered what teams need to get right before AI tooling can actually help them. Trisha’s first answer was essentially: stop adopting AI the way you have adopted everything else.

We generally get requirements, write the code, chuck it out there, and then you’re kind of done. That’s not how it should work.

What she advocates for instead is applying the scientific method to engineering decisions, which sounds obvious but rarely happens in real life.

Have a hypothesis, do your investigation, measure the results, have a conclusion. Generally speaking, we have not been great at that in our industry.

Applied to AI adoption specifically, that means being precise about what you are actually trying to achieve. What are we trying to achieve with AI? Do we want to deliver more features more quickly to the customer or do we want to perhaps deliver higher quality features? Because those two things are not necessarily the same thing Trisha concluded.

Therefore the practical instruction she gives is to run short experiments, measure one change at a time, and iterate. But have a hypothesis, figure out how to measure it, measure it, get feedback, and iterate over that.