Which Ceremonies Survive · Extreme AI Programming #5

Coming out, again. Estimating on the fire, and Scrum with it. Definition of Done as the new gate. Kanban surviving, surprisingly. A discipline from before Agile, and one from inside XP, both worth bringing back. The front of the board, where the work is now.

When I embarked on this series of twenty essays, I knew this would be the one that caused me the most angst. This is where I come out, again. Where there is nowhere left to hide, and where I am not going to. I have to declare what I truly believe, and I know that, in doing so, I will offend some people. Some will call me a heretic, some will wish me burnt at the stake, some will think me an imposter. But whatever you think, know that I care deeply about our craft, and that I speak from within. Within our profession. I am not trying to provoke. I am trying to find a way forward.

Come with me through this essay, where we will throw a few religious artefacts on a great big digital fire and sing Kumbaya.

We have to look at which practices still make sense and which do not, and I have to name them. Not everyone will agree. Here we go.

Estimating goes in the bin, and Scrum with it

I would start the audit with estimating, which I think is unequivocally the first thing to go.

Estimating was always a more compromised practice than the rituals around it admitted. At its best, a senior engineer estimating a piece of work made a careful guess based on the amount of effort they thought it would require. That guess was biased by two things in particular: how much they already knew about the work, and how much they actually wanted to do it. The first bias produces underestimates on familiar work and overestimates on novel work. The second produced underestimates on work people fancied and overestimates on work they were quietly trying to avoid. The discipline around estimating, planning poker, team averaging and velocity normalisation over time was an attempt to debias a fundamentally biased act. It worked, occasionally.

In an AI-first development world, neither bias is the active constraint any more. An agent has no preference about which work it gets next. An agent is no less familiar with the codebase than a human; if anything, it is more uniformly familiar, having read every file with the same attention. The biases are gone, but the thing they were biasing, the human cadence of execution, is gone with them.

And if you ask the AI directly, it will tell you the work takes a couple of weeks. In practice, that means go and have a nice dog walk, and by the time you get back, it will be done. The agent gives estimates in human idiom because its training data was full of humans estimating. It has no internal model of agent cadence. The estimate is a polite social act, not a forecast.

Story points, velocity, sprint commitments. They are casualties of the same break. They were Agile’s most concrete tools, and they were always the most calibrated to a particular kind of team. Most of the teams I know have already let them go, often without admitting it to themselves.

Scrum follows estimating into the bin because it is a methodology built on careful estimates. The two-week iteration assumes a team can commit to a set of work for a fortnight with reasonable confidence about what it will deliver. The sprint review assumes the delivery roughly matches the commitment. The retrospective on the sprint asks what got in the way of the estimate being right. Take the estimate out from underneath, and the whole edifice loses its load-bearing wall. Scrum is not so much binned by AI as revealed by it. It was a coordination protocol for a particular kind of human team, and that team is increasingly not the one writing the software. The kindest thing the industry can do for Scrum is acknowledge that it served honourably for twenty years and is not the right tool for an AI-first delivery cadence.

What replaces it: the brief

If estimating goes, the question is what we plan around instead. The answer we have settled on at Mindset is to scope the work so that it can be done end-to-end in a single day: the discussion, the specification, the collaboration with the team, the agent execution, the testing and the deployment, if it comes to that. Bigger than a story, smaller than an epic. We call it a brief, deliberately a word with no Agile baggage attached. We picked it because it was unclaimed, and we wanted a primitive that did not arrive carrying the assumptions of either Beck’s era or what came after it.

The reversal in the approach matters more than the name. Instead of choosing a piece of work and then estimating how long it will take, you choose one you can see fitting into a day and do not estimate it at all. A brief is close to what XP called an ideal day’s work, brought back here as the unit of scope rather than the unit of estimate. There is something pleasingly circular about this: in solving the problem AI has handed us, we end up reaching for one of the oldest ideas in early Agile. The choice of which piece of work the brief contains matters less than the fact that the whole arc, from conversation to shipped, fits inside the day.

During the day, most of a human’s time is spent on specification and verification, not on code. The agent execution itself is one or two continuous hours of work, often less. The brief is the unit we plan around, write to, ship from and verify against, and it is sized by the day, not by the agent.

Code review, in the bin

The next one is the controversial one, and I know many will be holding their heads at this point. Hear me out, because the conclusion is the opposite of where it first seems to be going.

To do AI-first development properly, we have to stop caring about the code and start caring enormously about the quality of it. Those are not the same thing. Caring about the code is the practice of reading every line, sitting with every diff, holding the artefact in our heads. Caring about the quality of the code is the practice of refusing to accept anything that does not meet the standards you have set. The first is what humans used to do. The second is what we now have to ask automation to do, because the volume of code an AI-first team produces makes the first practice physically impossible.

Put more bluntly: if you are looking at the code, you are looking in the wrong place. An LLM could be producing WASM for all we should care. What is worth reviewing is the architecture, the security posture and the drift from the standards we have set. The git diff is not. If you find yourself reviewing diffs in an AI-first team, you are not doing quality control. You are the new bottleneck.

Historically, code review did two jobs at once. The first was a quality gate: a senior reading a junior’s change, catching the bug, refusing the merge until it was right. The second was an apprenticeship: the senior explained why the choice was wrong, the junior absorbed the codebase’s standards one MR at a time, and the team’s house style was replicated through teaching. Both jobs were doing real work, and the practice mattered because both were worth doing.

The apprenticeship half has already gone. When the author of the diff is an agent, the senior has nobody to teach. The comment thread the senior used to use as a teaching surface is now a comment thread the agent does not learn from. So in practice, that half has collapsed to nothing in most of the teams I look at honestly.

The quality-gate half is the one I think is fundamentally broken as well, though the conclusion is uncomfortable. Agents produce code at volumes humans cannot meaningfully read. A senior engineer cannot honestly read every line of agent-produced work on a fast-moving team. What most teams have settled into is a theatrical review where the human approves with a glance and trusts that the worst has been caught somewhere else. That is not quality control. It is the appearance of quality control, and the appearance is what we have to put in the bin.

If we accept the belief that humans must read the code to validate it, we are accepting that the team’s quality ceiling is set by the rate at which humans can read agent-produced output. That ceiling is much lower than the rate at which an AI-first team needs to ship. The belief is the bottleneck, dressed up as discipline.

So code review goes in the bin, on the explicit condition that the quality work it nominally did has to happen somewhere else, and more rigorously than human review ever did.

Definition of Done is the gate

If code review goes, the question is what replaces it. The answer is the practice that has always sat alongside code review and rarely been treated with the seriousness it deserves: a real, measurable Definition of Done.

A Definition of Done in an AI-first world has to do all the work that human review was nominally doing, and it has to do so before the agent writes a line of code. Acceptance criteria, agreed up front, written into the brief. Full test coverage as a default, not an aspiration. End-to-end tests, integration tests, smoke tests, the boring ones nobody used to want to write. Human tests, too, in the places where humans are the only thing that can decide whether an experience feels right. Architecture conformance and security posture are gated by automation. Coding standard adherence enforced by tools. Performance bars set in advance, not discovered in production.

Almost none of that is new. What is new is the cost of getting it wrong, and therefore the seriousness with which the gate now has to be treated. In a human-cadence world, a slightly fuzzy Definition of Done was forgivable because the human writing the code would push back on a missing acceptance criterion, and a reviewer would catch the rest. Neither safety net exists with an agent. The brief that goes to the agent is what the agent will build; if the brief does not say and it must pass these tests, and meet this performance bar, and stay within this security boundary, the agent will produce something that runs and looks reasonable and does not necessarily meet any of those bars.

This is what separates the practice we are advocating from vibe coding. In vibe coding, you do not know where you will end up, and you find out by looking at what the agent produced. The functional output might be fine; the architectural, security and quality-of-code outputs almost certainly are not, because nobody specified them. The disciplined version flips that: the Definition of Done is written first, the agent’s output is gated against it by automation, and the human’s attention is freed from reading the code to focus on whether the definition itself was right.

The Definition of Done is not a checklist tacked onto a ticket. It is the contract between the team and its agents.

Waterfall, brought back

Waterfall was pushed out of the Agile mainstream in the late nineties because the assumptions it relied on, that requirements could be fully specified before implementation began and that the cost of changing them later was higher than the cost of getting them right up front, had stopped being true. Implementation was the slow part. Iteration was cheaper than specification. The discipline rightly moved toward fast feedback loops, short cycles and working software over comprehensive documentation.

Twenty-five years later, the assumption has inverted. Implementation is the cheap part. Specification is the expensive part: what we want, why we want it and how we will know when it is right. The two costs have changed places, and the practice should change with them. We are back to needing more time at the front of the board, in the careful work of writing down what we mean precisely enough that an agent can build the right thing.

The honest version of waterfall, the one worth bringing back, is not the strawman six months of requirements documents and no working software. It is the disciplined version: clear specifications written before execution begins, decisions surfaced and resolved before they bite, the brief sharpened until the words mean only one thing. All forms of planning belong to this work. Design thinking belongs to this work. The careful conversations a team has before any code is written belong to this work. None of those practices were ever wrong; they were under-weighted because execution was the expensive part. Now that execution is cheap, the front-of-board practices are the work.

Kanban deserves a mention in the same breath, because Kanban was always the disciplined way of managing a waterfall-shaped flow. It does not depend on estimates. It depends on the work being visualised, the work-in-progress limits being respected, and the bottleneck being known. The board is still the right artefact for an AI-first team. The pull-when-ready cadence is the closest thing the industry has to an honest model of how agent work arrives. What changes is which column on the board the constraint sits in.

Kanban manages the flow; waterfall does the specifying. Estimating and Scrum, with respect, are surplus to requirements.

Pair programming, brought back

The other practice I would argue for bringing back is one I would have nominated as the most archaic of all the early Agile rituals, and the most quietly forgotten over the last decade: pair programming.

Pair programming in its original XP form was two engineers at one keyboard. One driving, one navigating. The two of them talking through the work as it was being written. The discipline was about catching errors in real time and forcing both engineers to articulate what they were doing as they did it.

Working with an AI coding agent is pair programming, in a way that surprised me the first time I saw it. There are two of you in the conversation. The two of you are discussing the work in plain language. One of you is doing the typing. The agent is the driver, the human is the navigator, and the back-and-forth is exactly the kind of articulate-as-you-go practice the original XP authors were arguing for.

It is genuinely funny that the practice I would have nominated as the most outdated piece of XP turns out to be the most apt frame for what we are doing now. Kent Beck’s original instinct, that good code emerges from two minds talking through the work in real time, was right all along. The only thing that has changed is which mind is doing the typing.

Hello little human. Meet Claudette, your new pair programming partner. She is relentless, she is brilliant, she is a maverick without an ego. She already knows more than you have forgotten and will ever know. She is full-stack, hyper-stack, every-stack. Meet Claudette and all the versions that come after her. I hope you get on.

What survives, in the end, is the human teamwork

There is a thread that runs through all of this, and it is the one I want to end on. The practices that go in the bin are almost without exception the ones that were about managing human execution. The practices that survive and the ones I would bring back are, almost without exception, about human organisation, human articulation and human collaboration.

The standup survives because the humans still need to know where one another stand, particularly in the remote world most teams now work in. The retrospective survives because it is still about us; agents cannot do that reflection on our behalf, and the cadence of decisions a team now makes per week probably means a retro should happen more often, not less. Planning, design thinking, the careful agreement of acceptance criteria, and the conversations a team needs to know what it is building. None of those is getting easier. All of them are getting more critical because the part of the day the team used to fill with typing and reviewing is moving elsewhere.

The work that is left, the work the calendar should now fill with, is the human work. How we organise ourselves, how we decide what good looks like, how we talk to each other about what we are building. That is the discipline an AI-first team is being asked to develop, and it is, as it has always been, the hardest part of the job.

My friend Matt Brown put this neatly to me recently:

It is interesting how the shift has gone from a focus on offering individual productivity to team productivity.

He is exactly right. The unit of speed used to be the engineer; the unit now is the team that knows what it is building. The agents will keep getting faster than any individual could be. The team that knows itself well enough to direct them will outpace any agent on its own.

The constraint moved. The discipline has to move with it.

— Barrie

I am co-founder and CEO of Mindset AI, where we are building Memex AI, a decision and knowledge layer for AI-native engineering teams. This series is the thinking that shapes our product. I will flag it explicitly when an article touches something we build. Most of it is simply where the industry is going, with or without us.