March 19, 2026

Extreme Programming in the Age of AI

Deutsch · English

When the Bottleneck Shifts

Software development is undergoing a fundamental transformation.

A helpful perspective on this comes from the Theory of Constraints, developed by Eliyahu M. Goldratt.

Theory of Constraints at its Core

The basic idea is surprisingly simple: In every system, there is usually one dominant bottleneck that determines throughput.

If you improve this bottleneck, the entire system improves. And if it disappears, it reappears elsewhere. Progress, therefore, doesn’t simply mean making systems faster. Progress means shifting the bottleneck.

Goldratt formulated an iterative process from this:

Identify the constraint
Maximize the constraint
Align everything else with it
Expand the constraint
Return to step 1 (because the bottleneck shifts)

Ultimately, any serious improvement to a system consists of deliberately shifting this bottleneck. In knowledge work, it can change—or fragment—quickly.

Interestingly, this perspective also helps reframe one of the most influential ideas in software development.

XP as Bottleneck Shifting

When Extreme Programming emerged in the late 1990s, its radical aspect wasn’t individual practices like pair programming, test-driven development, or simple design.

The radical aspect was its systems thinking. XP worked because its practices reinforced each other.

XP wasn’t a toolbox. It was a cohesive system of practices designed to maximize a team’s throughput under uncertainty.

At its core, XP was an attempt to systematically stabilize the dominant bottlenecks in software development. The result was a development process designed to keep change continuously possible.

The Bottleneck Shifts Again

With generative AI, this landscape is changing once again. And radically so.

Large language models reduce many tasks that have long made up a significant portion of development work. They can generate code, suggest variations, and explore solution spaces.

Coding agents go a step further. They can autonomously execute entire development cycles.

This doesn’t just change the speed. It changes the structure of the entire system.

Technical implementation is no longer automatically the limiting factor. The bottleneck increasingly lies elsewhere:

Problem formulation – What are we actually building?
Design decisions – Which structure will hold up over time?
Evaluation – Is a generated solution actually good?
Understanding – Can the team still explain the code?

The limiting factor is increasingly cognitive and coordinative.

What this means for XP

XP is shifting its focus from coding to collective cognition.

As the bottleneck shifts, so does the role of the practices.

Some become more central.
Some change their role.
And some are emerging for the first time.

The new bottleneck: human understanding

Pair Programming

Cognitive Triad

Pair programming combines two perspectives: The “driver” writes code, while the “navigator” keeps an eye on structure and design.

Large language models introduce new and intriguing constellations.

They are powerful sparring partners. They can suggest code, discuss variations, and explore solution spaces.

But they also amplify confirmation bias. Critical reflection can decrease. False assumptions scale faster than ever.

With coding agents, this dynamic shifts further. Tools can autonomously write code, run tests, and propose changes as pull requests.

In its most extreme form, pair programming evolves into a cognitive triad: Customer — Developer — Model.

Agents suddenly make the closest possible coupling operational.

The customer provides problem understanding. The developer is responsible for system structure and context. The model handles implementation and exploration.

The triad does not describe a fixed set of participants, but roles. Multiple people can be involved. Models can work in parallel or be specialized as sub-agents.

The developer becomes less of a “driver” and more of a “navigator” of a system that explores entire solution spaces and materializes them in large structures.

A central aspect remains pair rotation. In classic XP, pairs rotated frequently—sometimes every 20 minutes. The goal was maximum knowledge transfer and collective understanding.

In the AI context, this idea becomes more radical. Not only people rotate. Roles rotate as well—often independently.

Customer, Developer, and Model can recombine in different constellations. Models can be swapped, specialized, or instantiated as sub-agents. They can also operate asynchronously in the background, exploring additional solution spaces in parallel and reporting results as they become available.

Pairing thus becomes a dynamic system of shifting perspectives, rather than a fixed collaboration. This is enabled by something fundamentally new:

The thinking process itself becomes persistent.

Chats, prompts, agent logs, and decision sequences leave a trace—an external record of the decision tree.

This record is more than documentation. It is reviewable, versionable, reproducible (not deterministic, but sufficiently stable), and in some cases even executable.

This creates a new team artifact: not just code and tests, but the path to the decision.

The tension between these three roles increases collective intelligence—precisely the original goal of pair programming.

Collective Code Ownership

Radical Code Transparency

Collective ownership means: Everyone can change anything.

With coding agents, this idea becomes more radical.

An agent with project access can change anything—within system constraints.

Code increasingly manifests as the result of orchestrated processes. Collective ownership—with a machine.

Authorship becomes meaningless.

Instead, collective understanding becomes critical. The goal: Maximize the reduction of black-box risk.

Collective ownership becomes radical code transparency.

If code can be changed at any time—by humans or agents—its structure and intent must remain understandable to everyone.

Within limits, code loses its status as a long-lived asset. In many cases, it is more efficient to generate new code than to modify existing code.

This makes code interchangeable. Two implementations are equivalent as long as they arise from the same context and pass the same tests.

The system no longer “owns” code—it owns the ability to regenerate it at any time.

Control over a drifting system that generates code

Test-Driven Development

Behavior-First Engineering

Tests have always been about specification.

With generative programming, they become the control interface.

Coding agents can generate, modify, and restructure code. But they need clear boundaries for acceptable behavior.

Test-first development provides exactly that. Tests define desired behavior before implementation and become the contract between humans and machines.

In an agent-based environment, tests become the programming language used to control coding agents.

Agents can read, write, and execute tests—and adapt implementations until tests pass. The classic TDD loop, partially automated.

This shifts the hierarchy: Tests become the primary artifact. Code becomes the derivative.

TDD becomes even more central—evolving into behavior-first engineering.

It stabilizes a probabilistic system against drift—the gradual divergence of behavior under seemingly identical conditions.

Continuous Integration

Continuous System Verification

Continuous Integration ensures that integration problems surface early.

With generative AI, the rate of change increases significantly.

CI becomes the immune system.

Beyond classic tests, pipelines increasingly verify whether fundamental system properties remain stable.

In systems modified by generative models, a new risk emerges: probabilistic drift. Small, locally correct changes can gradually shift the system away from its original structure and semantics.

CI becomes the mechanism that makes this drift visible—and controllable.

As changes are implemented faster and in larger chunks, their potential impact grows. Limiting the blast radius becomes critical.

XP has always used acceptance tests as a customer interface. In agent-based environments, a similar structure emerges within the team:

Continuous Integration becomes the acceptance layer for generated code—continuous system verification.

The development team defines the criteria a system must meet before it is considered understood and ready for integration.

Design Strategies for a World of Extremely Cheap Code

Refactoring

Continuous Code Assimilation

Refactoring keeps software soft—over time.

In the age of generative code, it’s no longer just cleanup. It’s entropy control.

Models generate working code—but rarely coherent design. Different patterns, redundant abstractions, and inconsistent structures emerge in parallel.

Refactoring becomes a continuous assimilation process.

Generated code isn’t simply accepted—it’s actively integrated into the architecture.

Only through refactoring does the team translate machine-generated output into shared understanding. Refactoring helps digest generated code.

Coding agents amplify this dynamic further. Structural changes that once took days can now happen in minutes.

Refactoring shifts from a scarce resource to an inexpensive, near-continuous activity.

For such systems to remain stable, clear context boundaries become critical. Concepts like core models and bounded contexts help structure systems into units that can be independently understood, generated, and recombined. They create islands where generative code remains manageable—with a limited blast radius and a level of complexity that stays within context for both humans and models.

At the same time, a new risk emerges: When change becomes too cheap, the temptation grows to continuously shift structure without truly understanding it. With generated code, the rate of change increases, and structural consistency becomes more fragile.

Design becomes less predictive and more evolutionary. XP always claimed this. Agentic coding makes it real.

Simple Design

Radical Simplicity

Simple Design means: only as much design as necessary.

In the LLM era, this becomes even more important.

Generative systems tend toward overcomplexity.

They produce extra abstractions, defensive structures, and overly generic architectures.

Generated code is fundamentally different: it is no longer purely intentional, but the result of a stochastic process.

The team’s task shifts.

Less about inventing solutions. More about consistently removing complexity.

Simple Design becomes radical simplicity—a filter against generative complexity and one of the most critical skills.

System Metaphor

Shared Mental Model

The System Metaphor was itself a form of generative model.

It addressed a central problem: How do you create shared understanding with enough bandwidth to support a growing system?

This function is becoming important again in the age of generated code.

LLM-generated code can appear locally meaningful without being globally coherent. Different parts of the system may implicitly follow different architectural ideas.

Teams therefore need to place greater emphasis again on a shared mental model of their system. Narratable structures such as metaphors, domain models, and narrative descriptions.

A good system metaphor acts as a form of cognitive compression: It makes complexity narratable—and therefore collectively manageable.

In a world with coding agents, it becomes the generative architecture.

Coding Standards

Machine-Readable Design

Coding standards are no longer just social conventions.

With AI-generated code, their role changes significantly.

Code is created faster and in greater volume than ever before.

Alongside implementations, many intermediate artifacts are generated as well. Teams must preserve structural stability—even as new material is constantly produced.

This creates a new risk: structural drift.

Coding standards evolve into a form of machine-enforceable design discipline—preventing continuously generated code from losing coherence.

They become the system prompt of the repository: control signals for generative code.

Outer Value Loop: Earn or Learn

On-Site Customer

Customer-Driven Discovery

In the original XP model, the on-site customer brought the requirements dialogue directly into the team.

In the age of generative AI, this role shifts more than any other.

Today, customers can begin exploring both problem and solution spaces themselves. With generative models, they can simulate scenarios, test interactions, and outline potential solutions.

Acceptance tests evolve as well. What used to be a bottleneck can now be prepared test-first by the customer. With AI support, examples, scenarios, and edge cases can be created and iterated quickly.

The customer (or product person) becomes a co-discoverer in the search for the right solution. The original XP idea of driving metaphor gains new momentum.

In combination with coding agents, an even deeper shift occurs. When models actively participate in exploration and implementation, the fundamental unit of collaboration changes.

The classic XP pair expands into the smallest functional unit of modern product development: Customer — Developer — Model.

What appears as a cognitive triad in pair programming becomes the structure of the entire team.

The customer contributes problem understanding and hypotheses. The developer ensures system structure, context, and integration. The model enables rapid exploration and implementation.

The on-site customer evolves into customer-driven discovery within a cognitive triad.

The whole-team concept becomes more radical: a tightly coupled system of humans and generative models exploring problem and solution spaces together.

The on-site customer thus becomes something XP always hinted at, but rarely achieved: a co-designer of a socio-technical system where thinking and implementation are deeply intertwined.

Planning Game

Continuous Discovery Loop

The planning game synchronizes development and business.

With LLMs, the bottleneck shifts fundamentally.

Implementation becomes cheap. Exploration becomes scalable. Teams can generate more solutions than they can realistically evaluate.

The real risk becomes building too early.

The planning game evolves into a continuous discovery loop—hypothesis generation, exploration, and learning.

Coding agents amplify this dynamic further. They can prepare experiments, run tests, and analyze results.

The planning game becomes partially automated. Developers and product owner spend less time planning work—and more time orchestrating learning.

Value is no longer created by writing code, but by discarding possibilities.

When options explode, selection becomes the bottleneck. Enforcing constraints becomes a key skill.

Saying no becomes more important than saying yes.

Small Releases

Micro-Experiments

Small releases reduce risk by delivering software in small increments.

With generative AI, code production becomes extremely cheap. Features can be generated faster than ever.

The new risk is feature overproduction.

Small releases become micro experiments. Each change tests a hypothesis about user behavior, system impact, and product value.

A release becomes, above all, a learning step.

When experiments become cheap enough, they begin to replace discussions. A release can create clarity faster than a meeting.

XP always implied this idea. Generative systems now push it to the extreme.

Sustainability as a System Invariant

Sustainable Pace

Sustainable Cognitive Load

XP always aimed for a sustainable pace of work.

AI introduces a new temptation: artificially increasing speed.

The real bottleneck, however, remains human judgment.

AI shifts the constraint into the human mind.

As iteration speed increases, so do decision pressure, context switching, and mental load.

Sustainable pace evolves into sustainable cognitive load.

XP thus protects its most critical resource: human attention.

New Patterns

Beyond these shifts, new patterns are emerging that didn’t exist in original XP.

One particularly interesting one is Disposable Code.

When LLMs can generate prototypes extremely quickly, code becomes something to explore with—and then discard. The classic spike solution becomes a core discipline.

This leads to another pattern: Reconstructive Programming.

A generated system first serves as an exploration tool. The actual implementation is then deliberately rebuilt—test-driven and grounded in understanding.

A third pattern is Exploration at Scale.

LLMs allow teams to explore many variations of a solution space. The human role shifts from production to selection and decision-making.

Especially in unfamiliar or poorly understood environments, this capability becomes critical: Exploration replaces speculation with verifiable insight.

XP Shift through Generative AI

XP systematically shifts along the new bottleneck—from code production toward understanding, decision-making, and system coherence.

XP Shift	Stabilized Bottleneck
Cognitive Triad	Quality of Decisions
Radical Code Transparency	Collective Understanding of Code
Behavior-Oriented Development	Stable System Behavior
Continuous System Verification	System Integrity
Continuous Assimilation Process	Structural Coherence
Radical Simplicity	Cognitive Complexity
Shared Mental Model	Architectural Understanding
Machine-Enforceable Design	Structural Consistency
Customer-Led Exploration	Problem–Solution Fit
Continuous Learning Cycle	Problem Understanding & Selection
Small Experiments	Learning from User Feedback
Sustainable Cognitive Load	Human Judgment

XP has always been human-centered. It was a system deliberately built around people.

That does not change in the age of generative AI.

On the contrary: the more we automate, the more important it becomes to ask what we are actually optimizing these systems for.

XP offers a clear answer: for understanding, for feedback, for collaboration—and for the human ability to make excellent decisions together.

Generative AI changes how we build software.

XP ensures that we don’t forget why we build it.

Why XP is Relevant Right Now

XP never primarily aimed for speed, but for stability under uncertainty. This characteristic becomes critical when working with generative systems.

We are no longer just programming a system. We are programming the conditions under which a system emerges. Generative AI brings some of XP’s most radical ideas back to the center.

XP has always been about maximizing the rate of learning—a system of tight feedback loops and deliberately shifted bottlenecks.

In the context of AI, this core becomes more visible. Generative AI accelerates exploration. It expands the solution space dramatically. XP ensures that teams don’t lose orientation within that space.

Models and coding agents generate vast amounts of code. But this very abundance creates new risks. Code can degrade as quickly as it is produced. Without strong feedback loops, complexity grows uncontrollably.

The New Bottlenecks

The bottleneck shifts accordingly. As machines generate code ever faster, XP stabilizes the resources that resist automation:

Understanding
Decision-making
Shared mental models
Coherent system structure

In a world where code becomes cheap, understanding and judgment become the scarcest resources.

This is exactly where XP practices apply.

The Real XP Shift

Extreme Programming originally aimed for one thing: to keep change cheap at all times.

When code production becomes cheap, the cost shifts. Understanding becomes expensive.

The new mission becomes:
Keep the cost of understanding low.

Code production is no longer the bottleneck.
Meaning is.

If you're exploring XP in the context of AI, I’d love to hear from you: f@frankwestphal.de

If you found this useful, you can support my work with a donation. I’m exploring these ideas further and plan to turn them into a small series.

Frank Westphal

Extreme Programmer. Machine Learner. Indie Developer.