CTOs Agree: Cognitive Debt Is the New Technical Debt

Credit: CTO Craft Con

At a CTO Craft Dinner in Toronto, I sat down with engineering leaders from more than a dozen tech companies and asked where AI has actually landed. The free-for-all is over and we need to be realistic.

Our Shift CTO Craft Dinner format is built on candor, rather than slides or sponsor pitches. It’s just a group of senior engineering leaders talking about what’s actually happening in their organizations. At our Toronto dinner held on the outskirts of the CTO Craft conference, the theme was AI adoption in engineering

Within the first 5 minutes it was clear we’d spend the evening confirming one uncomfortable truth, similar to the one we heard earlier this spring in London: nobody has truly yet figured it out.

Note: We’ll try to figure it out at our Shift developer conference in September on the beautiful Croatian coast – tickets are on sale!

The free-for-all is over

Two years ago, the mandate was simple: spend on AI, no questions asked. That era is over.

What’s replaced it is a harder conversation, one several participants had clearly been having with their CFOs. The question has shifted from are you using AI to what are you getting for it?

The need for an ROI hasn’t changed. If anything, that window of just spending freely is dwindling. For larger organizations, the expectation is return within 12 months, or sooner.

The challenge, as one participant put it, is that the I in ROI is completely unmanaged. Engineering capacity used to mean headcount, something finance could model. Now it means tokens, and nobody controls how many tokens any individual engineer burns on a given day.

The CFO has no control once they’ve signed the contract over what the actual investment is going to be. And if you can’t tell me the investment, what does projected return even mean?

Several people around the table had run into the same wall: organizations that bought the tools, signed the contracts, and then realized there’s no financial model inside the company to manage what comes next. The comparison kept coming up: early cloud adoption, FinOps before FinOps existed. We’re in that same window. Costs are still a fantasy, usage is still undefined, and the metrics to measure it haven’t been invented yet.

One participant’s take: pushing for ROI too hard right now might mean measuring the wrong things entirely. The smarter move is to establish a baseline first.

I think eventually it will get to a more predictable state where we can say, approximately this many tokens leads to this much feature value. But I don’t think we can predict that yet.

The more pragmatic response, shared by more than one team: stop the free-for-all, start standardizing. This doesn’t mean we’re telling people to use AI less, but nudging from “use everything” to “use the same things, smarter.”

What you’re actually hiring for now

The hiring discussion exposed a split in how people in the room think about what an engineer actually is.

One participant drew a clean line between two different roles that often get conflated: the AI engineer who ships product, and the engineer who owns system design. Language doesn’t matter anymore: Python, Go, Rust, Node. But system design hasn’t changed. Someone still has to think about availability, budgets, and the architectural decisions that AI can’t make for you.

The new software engineer is a product leader. Someone thinking about what the product is, not just how it works. But we still need technical people who think about the design. Those are two different things.

On the interview side, the consensus leaned toward keeping technical fundamentals, but with caveats. One team hadn’t changed their process yet, still testing for systematic thinking, for the ability to break down ambiguous problems. Others were actively rethinking it. The most interesting take came from someone who’d shifted their interviews toward code review rather than coding, precisely because that’s what engineers actually do now.

I changed our interview process to focus on code review, because that’s what we’re actually doing. And implementation is now AI-assisted, however you choose to use your agents.

If your team can generate code faster than it can review it, you have a bottleneck. The constraint is human judgment, not output.

Team reactions and ‘recalibration moments’

Across the table, nobody described a team that was uniformly enthusiastic or uniformly resistant. The reality was messier.

One leader talked about late adopters at his company who finally jumped in after being gently pushed, then hit what he called “recalibration moments“: realizing that whole categories of work that used to take days now take hours, and having to rethink how their schedule works.

Another described something more complicated: an engineer who is highly productive with AI, genuinely good at using it, but also deeply skeptical of AI-generated output.

There’s one person who’s really good at using AI, very productive, but also highly sensitive to anything AI has produced. “This is written by AI.” “Okay, but is it good?” “Yes, it’s good. But it’s written by AI.”

The concern about AI making people stop thinking came up more than once. The counterargument wasn’t a dismissal. It was a reframe: this is a management problem, not a technology problem. The tools make it easy to be lazy. The job of a leader is to make laziness not worth it.

You can choose to make something fast and long and not that good. Or you can use it to iterate and really drill it down to something short. We can all write really long letters now. Whether we should is a different question.

One company ran an anonymous survey and found 90% of engineers at that organization actually want to use AI, higher than expected. What surprised him wasn’t the enthusiasm but what came next: questions about performance management, about promotion criteria, about how individual contribution gets recognized when anyone can now generate code. The adoption had outrun the enablement.

We’re applying AI adoption rapidly, but we haven’t updated the career pathway or rethought the competency matrix. They have every right to ask those questions.

The feature debt problem nobody wants to talk about

AI makes it cheap to write code. That is not the same as it being cheap to ship it, or to maintain it. One participant put it cleanly: cognitive debt is the new technical debt.

The room had seen the same pattern: teams adding features at a pace that would have been impossible two years ago, now dealing with the maintenance overhead that comes with it. Legacy code that was already hard to understand is now harder, because the people who wrote it aren’t being careful. They’re being fast. And internal tools that were never meant to be permanent are now permanent because someone shipped them with three prompts.

Those processes, build or buy, is this worth maintaining long term, were there for a reason. But if you can spin something up in an afternoon, it’s easy to skip them. The problem comes later.

One participant’s response: write whatever you want, but writing it doesn’t mean you’re shipping it. Code is cheap, but launching it isn’t. Keeping that distinction alive in a team is harder than it sounds when management is celebrating every PR.

The concept of a dedicated “technical health team” came up as a structural answer: engineers who ship features and engineers who delete or refactor them, treated as equally valuable work. Getting buy-in for that from a business incentivized by velocity is the actual challenge.

Who should be shipping PRs?

The most spirited part of the dinner: if AI makes it easy for anyone to write code, should product managers be shipping PRs?

The room was divided. Not on the possibility (everyone agreed the tools make it technically feasible) but on whether it’s the right thing to optimize for.

One participant pushed back hard on the narrative of non-engineers shipping to production as a win:

If you’re effectively making engineers into code review monitors, protecting the company from PRs they didn’t write, while the PM gets the credit for shipping, check in on your engineers’ mental health.

Others were more pragmatic. Enabling non-engineers to contribute on smaller, lower-risk tasks reduces the feedback loop for everyone. Designers who can write a component to spec without waiting for an engineer. Product managers who can fix a copy bug without opening a Jira ticket. That has value, as long as the guardrails are right.

The self-driving car analogy landed well: an average AI-assisted non-engineer is probably better than an average unassisted one. But nobody’s comparing them to the engineers who spent years developing the expertise. The comparison only makes sense within a specific complexity range.

If a feature is 80% engineering and 20% product thinking, an engineer can do it. If it’s the reverse, maybe a PM can handle it. What I think is actually happening is that we’re becoming value creators who pick up whatever slice of the work makes sense, regardless of title.

The harder structural question got no clean answer: if anyone can ship, whose job is it to hold everything together? Several people in the room said the only realistic response is radical team autonomy: small groups of three to five people who own their own decisions, with management’s job shifting to alignment rather than gatekeeping.

Code review is the bottleneck. AI might fix it.

One participant had been thinking about this from a CI/CD angle: the problem isn’t that teams can’t generate code, it’s that they can’t review it fast enough. Human review is now the bottleneck, and the solution isn’t more reviewers. It’s smarter triage.

His team had been experimenting with confidence scoring on PRs: using AI to assess the risk of a change and surface only the parts that actually need human eyes. A 15,000-line PR with three lines that need human review isn’t a 15,000-line review problem. It’s a three-line problem, if you can trust the rest.

I honestly think you can have a fifteen thousand line PR and say, I need a human to review these three lines. Everything else here is fine. I don’t know how to do that yet, but I know it has to happen.

The deeper point was about abstraction layers. We don’t read assembly, we trust compilers. The question is whether you can build enough validation infrastructure, feature flags, observability, acceptance tests, mutation testing, to make a similar trust relationship work with AI-generated code.

Nobody in the room claimed they’d solved it. Several said they were pushing as hard as they could in that direction.

One practical suggestion that came out of this: invest in evals now, not later. The cost of building AI features isn’t the hard part. The cost of verification is. If you build a solid eval suite today, you can swap providers, survive model deprecations, or move to open source without starting from scratch.

The cost of building isn’t as high. The cost of verification is high. When the vendor you’re partnering with changes their model, which they do often, you just run that suite of evals and you’re okay.

The vendor you’re betting on

What happens when Anthropic or OpenAI raises prices, goes down, or gets disrupted?

One participant said it plainly: her entire company has a single point of failure on Anthropic. Engineering understands single points of failure. The people in marketing and finance building on top of Claude do not.

So many teams outside engineering are building things, and I don’t think they truly understand what’s underneath it. If all of a sudden we can’t get inference, what happens to marketing? What happens to finance? They’re gonna call engineering.

The counterpoint was that competition will keep pricing in check. Open source models are no longer years behind; they’re a couple of versions back at most. You can’t double your prices when customers have alternatives. But the lock-in concern isn’t really about the model itself. It’s about everything built around it. Skills, tooling, internal workflows — those are much harder to migrate than a model endpoint.

The practical advice: build internal UI wrappers over generic model APIs now, before your teams are locked into specific product interfaces. It’s cheap to do, and it means you can swap the model underneath without rebuilding the interface your teams depend on.

What’s next? Let’s talk about it at Shift

These reflections, reactions and conversations were part of just one of our Shift CTO dinners that we’re organizing as a lead up to our Shift engineering and AI conference in September. All of our dinner participants have been invited – and so are you!

We’re also planning some Engineering leadership programming, but will be talking about topics just like the ones in this article – building AI native engineering teams, growing your developer career, scaling systems – across the conference agenda. I’d love for you to join us – get your tickets and see you in Zadar!

The dinner was held under Chatham House rules. Quotes are used without attribution.