Component decomposition and bounded contexts as a foundation for effective AI-assisted coding
AI coding tools produce code faster than humans can type. But speed without structure leads to systems that become harder to work with over time, for humans and for models alike.
If you have been working with AI coding tools on anything beyond a greenfield prototype, you have probably hit the wall: the tool works well on small, isolated tasks, but starts producing worse output as the codebase grows. It misses dependencies. It reimplements logic that exists two directories over. It generates code that technically works but does not fit the patterns the rest of the project uses.
The usual response is to blame the tool or wait for the next model release. But in most cases, the root cause is not the model. It is the codebase. Specifically, it is how much of the codebase the model needs to see to do its job, and how much noise comes along for the ride. This article looks at the mechanics behind that problem and what you can do about it at the architecture level.
The context window constraint
Every language model has a hard limit called the context window: the total number of tokens it can process in a single request. In code, tokens map roughly to variable names, operators, brackets, keywords, and string fragments. A typical 500-line file might consume 3,000 to 5,000 tokens. A large enterprise codebase will never fit in one window, not even close.
Go over the limit and the request fails, or the model silently drops older parts of the input. But the practical problems start well before the hard limit. Longer contexts cost more compute, and the model's ability to use information degrades as context grows. Liu et al. (2024) showed this empirically in "Lost in the Middle": model performance on retrieval tasks follows a U-shaped curve, with strong attention to the beginning and end of the input and significantly weaker performance on content in the middle. Add irrelevant code to the context (what the research calls "distractors") and accuracy drops further. As a fact: when the model sees your authentication middleware, three unrelated utility files, and half a test suite alongside the service you actually want changed, the quality of its output suffers.
Modern AI coding tools address this with indexing. They parse the repository, chunk the code along syntactic boundaries (functions, classes, modules...typically using something like Tree-sitter), embed those chunks into a vector space, and store them in a searchable index. At query time, the tool embeds your request, runs a similarity search, and retrieves the most relevant chunks to build the prompt. This is Retrieval-Augmented Generation (RAG) applied to code.
RAG helps significantly, but it does not eliminate the constraint of limited "attention span". The model still only sees what was retrieved. And here is where architecture starts to matter directly: if your code has clean boundaries and high cohesion, retrieval can pull in a small number of focused chunks that cover the change. If your code mixes concerns such as business logic tangled with infrastructure, shared mutable state across modules, domain concepts scattered across multiple packages then the relevant chunks get larger, noisier, and more numerous. The retrieval system either misses something important or drags in too much, and you are back to fighting the context window.
The quality of your architecture is, in a very literal sense, an input to how well your AI tools perform.
Why this changes the architecture conversation
You already know the fundamentals of software architecture if you have developed software before AI: high cohesion, low coupling, separation of concerns. You know the principle of information hiding. You have used bounded contexts in practice. None of this is new, if you worked previously in good old software engineering and understood the basics of OOP.
What is new is that these principles now have a second, very concrete payoff. A well-decomposed codebase is not just easier for your team to maintain, it is directly easier for an AI model to reason about. When you draw a clean boundary around a bounded context with a documented interface, you are also defining the maximum amount of code the model needs to see for changes within that context. The interface contract tells the model what it can assume about the outside world without loading the outside world into the prompt.
Think about what happens when you ask an AI tool to modify a service in a well-structured system. The tool loads the service code and its interface definitions. It can see the contracts with adjacent services, what data comes in, what goes out, what invariants hold. It does not need the implementation of those adjacent services. The prompt stays small. The model stays focused. The output is usually solid and reproducable.
Now think about what happens in a codebase where the same service reaches into a shared database, calls three other services through internal methods rather than interfaces, and has logging and error handling woven through every function. To make the same change safely, the tool needs to load all of that. The prompt balloons. Retrieval pulls in mixed-concern chunks where payment logic sits next to user management. The model misses a dependency or generates something that conflicts with a pattern defined elsewhere. You spend the time you saved on AI generation debugging the result.
If you have worked with AI tools on a codebase older than a year or two, you have seen this pattern.
The Vibe coding trap
"Vibe coding", or letting the AI generate structure along with implementation, iterating when things break, works for prototypes and throwaway projects. It falls apart at scale, when you actually work in a dev team and your codebase becomes more and more complex and big.
Without explicit boundaries, AI agents solve the same problem in different ways across the codebase. Without contracts, they reimplement logic that already exists because they cannot see it in their context window. The codebase develops logic drift: duplicated behaviour with subtle differences, inconsistent naming, patterns that vary by directory. Every experienced developer has seen this in human-written code; with AI generation, it happens faster because the code arrives faster and nobody has a mental model of the whole system.
The feedback loop and changing features or add new ones makes it worse. As the codebase grows more tangled, more code must be loaded for each change. The context gets noisier. Output quality drops. Developers respond by working on smaller, more isolated pieces, which increases fragmentation and duplication where according to general software quality standards it does not make sense. You end up spending more time reviewing and fixing AI output than you saved generating it.
The DORA research (DevOps Research and Assessment, now part of Google Cloud) consistently shows that high technical debt correlates with more unplanned work and rework. Industry data on AI-heavy workflows puts maintenance costs at 30-50% of initial build cost in poorly structured codebases, versus 20-25% for disciplined approaches. The difference often is the architecture.
Bounded contexts as context window boundaries
If you are already using domain-driven design, you have the main tool you need. A bounded context is, almost by definition, the right unit for AI-assisted work.
Recommended by LinkedIn
Take a drone delivery system. The maintenance team works with repair history, component wear, flight hours, performance metrics. The scheduling team works with availability, ETAs, route capacity. These are two bounded contexts, each with a different model of what a "drone" is. The maintenance team's drone has a parts list and a service log. The scheduling team's drone has a location, a status, and a time window.
When you modify the scheduling algorithm, the AI tool loads the scheduling code and the contracts it uses: an availability check, an ETA calculation, a status update endpoint. It does not need the maintenance domain's complexity. The prompt is relatively small. The retrieval is manageable. The context window has room to spare.
This is the architectural insight that maps directly to AI tool performance: a bounded context boundary is also a context window boundary. The smaller and more self-contained you make each context, the less the model needs to see, and the better it performs.
The same logic applies regardless of deployment model. Microservices enforce this through network boundaries. A modular monolith can enforce it through module visibility rules and dependency constraints (think Java's module system, .NET's internal access, or even disciplined package structure with tools like ArchUnit or NetArchTest). The enforcement mechanism matters less than the boundary itself.
Contracts and events as retrieval anchors
In a RAG-based workflow, the items that get indexed and retrieved are your architecture's structural units: functions, classes, interfaces, schemas. This means your contracts do double duty. They define the agreement between components and they serve as high-quality retrieval targets.
An OpenAPI spec, an AsyncAPI event schema, a well-defined interface in code; these are compact, semantically rich documents that tell the model exactly what a component expects and provides. When the model retrieves an interface definition alongside the code it is modifying, it can generate code that conforms to the contract without needing to see the other side's implementation.
Event-driven architectures amplify this. When components communicate through events rather than direct calls, each component only needs the event schema to understand the interaction. An OrderCreated event can tell Inventory and Billing everything they need to know about what happened, without any knowledge of how the Order service works internally. For AI tools, this means retrieval stays focused: the event schema plus the subscribing handler is usually enough context for a change.
Gregor Hohpe and Woolf documented these patterns in Enterprise Integration Patterns back in 2003. Newman's Building Microservices (2021) covers the modern application. The patterns are well established. What is worth recognising is that they now have a direct, measurable impact on AI tool effectiveness in addition to their traditional benefits.
This applies to multi-agent workflows too. If you are using specialised AI agents for different parts of the system: one for API design, one for payment refactoring, one for test generation...Events and contracts are how those agents coordinate without stepping on each other. Each agent works inside one bounded context, communicates through contracts, and stays within its context window.
What actually helps in practice
You know the patterns, or there are plenty of information about the basics of software architecture (and of course as a certified ISAQB Software Architecture instructor i am always happy to help 😎). The question is where to focus effort given that AI tools are now part of the workflow. Based on what actually makes a difference:
Treat your boundaries as context window budgets. When defining or reviewing bounded contexts, think explicitly about what the AI tool would need to load for a typical change in that context. If the answer is "half the system," the boundary is wrong or the coupling is too high. A good bounded context should fit comfortably in a model's context window...the code, its tests, and its interface contracts.
Make contracts the source of truth, not just documentation. Contracts that exist only in a wiki do not help RAG. Contracts expressed as OpenAPI specs, interface definitions in code, event schemas in the repository, or Pact contract test definitions...those get indexed and retrieved. Put them where the tools can find them.
Watch for cross-cutting concern leakage. Logging, auth, error handling, and observability code scattered through your business logic inflates every chunk that gets retrieved. If your service handler has 15 lines of business logic and 40 lines of middleware concerns, the model is spending most of its context budget on things that are not relevant to the change. Extract cross-cutting concerns aggressively..this does not only help for for clean code, but also for clean context.
Use dependency injection as a coupling detector. If a constructor takes twelve dependencies, that component is coupled to twelve other parts of the system, and the AI tool will need context from all of them to make safe changes. DI does not just help with testing; it makes coupling visible. Use that visibility.
Refactor before you generate. The temptation with AI tools is to generate first, clean up later. In practice, "later" rarely comes. If you are about to ask the tool to modify a tangled module, spend fifteen minutes extracting an interface or splitting a class first. The AI output will be better, and you avoid compounding the mess.
Apply the Strangler Fig incrementally. You do not need to rewrite the system. When you touch a tightly coupled area, wrap it in an interface, write the contract test, and move on. Over time, the codebase develops the boundaries it needs. Fowler's Strangler Fig approach was designed for exactly this: gradual, safe migration toward better structure.
Test contracts, not just implementations. Contract tests or also well-structured integration tests verify that components honour their interfaces. When a contract breaks, the test catches it before deployment, but it also catches it before the AI tool generates code against an outdated assumption. Tested contracts are reliable context for the model.
Conclusion
AI tools do not remove the need for architecture. They make the consequences of bad architecture visible faster. In a poorly structured codebase, model performance degrades, generated code gets worse, and the time saved on generation gets spent on debugging and rework.
The fix is the same set of practices the discipline has always recommended: high cohesion, low coupling, bounded contexts, explicit contracts, separation of concerns. What changes is the framing. These are no longer just maintainability concerns. They are direct inputs to how well your tools perform. A bounded context boundary is a context window boundary. A clean interface contract is a retrieval target. A well-separated concern is a smaller, more focused prompt.
The practices that make a codebase understandable to a new team member are the same practices that make it workable for a language model with a finite context window. If your architecture is good enough that a senior developer can make a safe change by reading one module and its interfaces, your AI tools can do the same. If it is not, no amount of model capability will compensate.
see you soon
Really curious about this one! What led you to share it?
Last post I read blamed the poor performance of AI tools on client companies. But yes, you can blame it on architecture, too.