This 6-layer constraint framework will make your design system actually work for AI agents
Most design systems were built for humans. A designer opens Figma, reads documentation, understands intent, and makes judgment calls. That worked fine for a decade.
But now AI agents are generating UI. Claude Code, Cursor, v0, Windsurf. They read your design system too. And they are terrible at judgment calls.
So the question becomes: what does a design system look like when your primary consumer is not a human designer but a coding agent?
I have been thinking about this for months. Here is what I landed on. Six layers, each with a specific job, organized so agents load only what they need and never guess. Think of it as a 6-layer coffee brew: you pour messy design decisions in at the top, each layer filters out more ambiguity, and what drips out at the bottom is consistent UI.
AI-native design system
├── 1️⃣ Foundation layer [always loaded, every session]
│ ├── skill.md How the agent should operate
│ └── governor-version One-line version pin
│
├── 2️⃣ Context layer [fetched on demand, never fully loaded]
│ ├── design.md Visual system (colors, type, spacing)
│ ├── brand.md Voice, positioning, guardrails
│ ├── component-index.md Available components + locations
│ ├── pattern-library.md Layout + composition patterns
│ ├── ui-states.md Loading, empty, error, populated
│ └── changelog.md What changed between versions
│
├── 3️⃣ Governance layer [constraint-based, makes wrong output impossible]
│ ├── tokens.json Valid token values only
│ ├── component-registry.json Valid prop combinations
│ ├── output-contracts.md What correct code looks like
│ ├── anti-patterns.md 30+ banned agent behaviors
│ ├── composition-rules.md How components can combine
│ └── decision-trees.md When to use what
│
├── 4️⃣ Agent instruction layer [one entrypoint per tool]
│ ├── .cursorrules Cursor example
│ ├── v0-system.md v0 example
│ └── system-prompt.md Generic agent fragment
│
├── 5️⃣ Source of truth [one canonical source per artifact]
│ ├── package.json Semver on governance layer
│ ├── version-manifest.json Version + file hashes
│ └── tokens.json Canonical token values
│
├── 6️⃣ Enforcement layer [runs on every code push, catches drift]
│ ├── lint-tokens.js Valid token references check
│ ├── token-budget.js Context cost per file
│ └── drift-check.js Figma vs code drift detection
│
└── Figma file [visual companion, downstream from code]
├── 00_readme System map
├── 01_tokens Annotated token layer
├── 02_components API-matched to code library
├── 03_patterns Assembled patterns
└── 04_agent_output Before / after proof
1️⃣ Foundation
What it does: Gives the agent its identity and rules before it writes a single line of code.
Think of this as the agent's onboarding. When a new designer joins your team, you don't hand them the entire component library on day one. You say: here is how we work, here is what matters, here are the things you should never do. That is what the foundation layer does.
It contains two files. A behavior file that tells the agent how to operate (how to name things, what patterns to follow, when to ask for clarification). And a version pin so the agent knows exactly which version of the system it is working with.
This layer is tiny. It loads every single session. The agent reads it first, every time.
2️⃣ Context
What it does: Gives the agent the specific knowledge it needs for the current task.
This is your visual system, your brand voice, your component index, your layout patterns, your UI states. Everything a designer would normally absorb by working in the file for weeks.
The critical detail: this layer is never fully loaded. The agent fetches only the section it needs. Building a form? It pulls the component index and the form pattern. Working on empty states? It pulls the UI states file. This is how you keep token costs low and context quality high.
If the foundation layer is onboarding, the context layer is the team wiki. You don't memorize the wiki. You search it when you need something specific.
3️⃣ Governance
What it does: Makes wrong output impossible instead of just discouraged.
This is the layer most teams skip, and it is the most important one.
A governance layer does not tell the agent "please use the correct border radius." It gives the agent a file where the only border radius values that exist are the correct ones. The agent cannot hallucinate a value that is not in the file.
It includes a token file with every valid value. A component registry with every valid prop combination. Output contracts that define what correct generated code looks like. An anti-patterns list with 30+ specific things the agent must never do. Composition rules that define which components can combine and how. And decision trees that tell the agent when to use what.
The philosophy is constraint over instruction. Do not write "use 8px spacing." Instead, make 8px the only spacing value available. Agents follow constraints perfectly. They follow instructions inconsistently.
4️⃣ Agent instructions
What it does: Translates everything above into the specific format each tool expects.
Cursor reads .cursorrules. Other tools read their own config files. Every tool has its own entrypoint, but the filename does not matter. What matters is that each one points to the same governance layer, the same tokens, the same rules. One source of truth, multiple entrypoints.
5️⃣ Source of truth and versioning
What it does: Makes sure every artifact traces back to one canonical source.
Tokens live in one place: tokens.json. Not in Figma, not in a Notion doc, not in a Storybook config. One file. Everything else is downstream.
The governance layer gets a version number, just like a code package. When you update a token or add a component, the version bumps. Agents can check whether they are working with the latest version. A manifest file tracks hashes of every file so you can detect if something changed without diffing the whole repo.
Figma is explicitly downstream. The Figma file is a visual companion, not the source. This is a hard shift for most design teams, but it is the only setup that works when agents are doing the building.
6️⃣ Enforcement
What it does: Catches violations after the agent generates code.
Even with perfect constraints, agents occasionally produce output that drifts. A lint script checks that every token reference in generated code points to a real token. A budget analyzer tracks how many tokens each file costs so your context stays lean. A drift checker compares what is in Figma against what is in code and flags mismatches.
This layer runs in CI (continuous integration, the automated checks that run every time someone pushes code). It is the automated reviewer that catches what the governance layer could not prevent.
🎨 The Figma file
The Figma file still exists. It has five pages: a system map (README), an annotated token layer, components matched to the code library's API, assembled patterns, and before/after proof showing agent output with and without the governance layer.
But the Figma file is the visual companion. It documents the system for human understanding. The code files are the system.
🧠 Why this structure matters
The whole setup solves one problem: agents do not understand intent.
A human designer sees a card component and understands from context that it needs 16px padding, a subtle border, and medium-weight text. An agent sees a card component and picks whatever values seem reasonable based on its training data. Sometimes it is right. Often it is close but wrong in ways that compound across a full interface.
Six layers fix this by removing ambiguity at every level. The agent always knows how to behave (foundation), what the system looks like (context), what values are valid (governance), how to receive instructions (agent layer), where truth lives (versioning), and whether its output passed (enforcement).
No guessing. No judgment calls. No "close enough."
🚀 Scenario A: Starting a new product with an AI design tool
Say you are building a new app from scratch using something like Google Stitch. You describe your interface in plain language, Stitch generates five screens in minutes, and you have a working prototype before lunch. It feels like magic.
Then you look closer. The first screen uses 14px body text. The third screen uses 16px. The button radii are 8px on the login page and 12px on the settings page. The spacing between cards is different on every screen. None of it is wrong enough to notice at a glance, but all of it adds up to an interface that feels slightly off.
This is the core problem with AI design tools when used without a system. They generate each screen in isolation. There is no memory between generations. Every prompt starts fresh, and the AI makes slightly different micro-decisions every time.
Here is how the six layers fix this:
Before you open Stitch, you write a tokens.json with your spacing scale, color palette, type scale, and radii. That is your governance layer. Then you write a short behavior file: always use these exact values, never invent new ones, always apply this specific font stack. That is your foundation layer.
Now when you prompt Stitch (or export from Stitch into Claude Code or Cursor for production code), the agent has constraints. It cannot drift because the only valid values are the ones you defined. The five screens come out consistent, not because the AI got lucky, but because you removed every opportunity for it to guess wrong.
As the product grows, you add a component registry so the agent knows which components exist and how they combine. You add patterns for recurring layouts like list views, detail pages, and onboarding flows. You add output contracts so every generated screen meets the same structural standard.
The key insight: with AI design tools, you do not need less system. You need more system, earlier. The speed of generation means inconsistency compounds faster than it ever did with manual design. A human designer building one screen per day catches drift naturally. An AI generating ten screens per hour does not.
Scenario A workflow: new product with AI design tools
Day 0 Write tokens.json (spacing, colors, type, radii)
Write behavior file (3 rules, font stack, token mandate)
↓
Day 1 Prompt Stitch / v0 for first screens
Export to Claude Code or Cursor for production code
Agent reads foundation + governance = consistent output
↓
Week 2 Add component-registry.json (valid props per component)
Add pattern-library.md (list views, detail pages, forms)
↓
Week 4 Add output-contracts.md (structural standard per screen)
Add anti-patterns.md (things the agent keeps getting wrong)
↓
Month 2 Add lint-tokens.js + drift-check.js in CI
Add agent entrypoints for each tool your team uses
Full six layers running
🔄 Scenario B: Making an existing design system AI-ready
Most teams are not starting from zero. You already have a design system. It lives in Figma, maybe with some Storybook documentation, maybe with a token file somewhere. Designers use it daily. It works.
Now someone on the team starts using Claude Code or Cursor to generate UI. The agent does not read your Figma file. It does not browse your Storybook. It sees your codebase and makes assumptions. Some of those assumptions align with your system. Many do not.
This is a different problem than Scenario A. You are not building constraints from scratch. You are translating an existing system into a format agents can consume.
Here is the migration path:
The migration is not a one-time project. It is an ongoing translation effort. Every time the agent gets something wrong, you tighten the constraints. Over a few weeks, the gap between agent output and human output narrows until the agent is producing work that passes design review on the first try.
The uncomfortable truth for most teams: your existing design system probably has gaps you never noticed because human designers filled them with judgment. The agent will expose every single one of those gaps. That is painful at first, but it makes the system stronger for everyone, humans and agents alike.
Scenario B workflow: migrating an existing design system
Step 1 Extract tokens from Figma into tokens.json
Clean up inconsistencies (the three grays problem)
↓
Step 2 Write component-registry.json from your existing library
List every component, its valid props, allowed combinations
↓
Step 3 Write behavior file (agent entrypoint for your tool)
Point it to tokens.json + component-registry.json
↓
Step 4 Run agent on a real task, compare to human output
Log every deviation
↓
Repeat Each deviation = one new constraint in governance layer
Gap between agent output and human output shrinks weekly
↓
Ongoing Add CI enforcement (lint, drift check, token budget)
Figma file becomes downstream visual companion
Agent passes design review on first try
🔗 "But I start everything in Figma"
Fair. Most designers do. And nothing in this setup requires you to stop.
The shift is not about abandoning Figma. It is about what counts as the source of truth. Today, most teams design tokens in Figma, then export them to code. In an AI-native setup, the tokens file is the source, and Figma reflects it.
In practice, the workflow can still start in Figma. You define your color palette, type scale, and spacing in Figma the way you always have. Then you extract those values into a tokens.json. That extraction step is the bridge. Once the tokens file exists, agents can read it. Figma stays your design environment. The tokens file becomes the contract between Figma and every AI tool your team uses.
The good news: this is no longer a one-way street. Figma's MCP server now supports bidirectional workflows. Agents can read your Figma file to get design context (components, tokens, layout constraints), and they can write back to Figma too. A coding agent can generate a screen, push it to Figma as editable layers, and a designer can refine it on the canvas. Then the agent can pull those refinements back into code.
This means the six layers do not sit outside Figma. They sit between Figma and code, and data flows in both directions. A designer updates a color variable in Figma. A sync step pushes it to tokens.json. The agent picks it up on the next generation. An agent builds a new screen in code. A sync step pushes it back to Figma for visual review. The governance layer validates both directions.
The Figma file does not lose value. It gains a new role: the visual environment where humans verify, refine, and iterate on what agents produce. Designers still design in Figma. They just stop being the only ones who can read or write to the system.
☕ Where to start
Whether you are in Scenario A or B, the entry point is the same two files: tokens.json and a behavior file. Everything else stacks on top as you hit real problems.
The remaining layers come naturally. Agent using the wrong component? Add a component registry. Output drifting from Figma? Add a drift checker. Team using multiple tools? Add agent-specific entrypoints.
Build the system the same way you would build a product. Start with the most painful problem and expand from there.
🔓 "Won't this kill exploration?"
This is the first thing most designers push back on. If you constrain everything, where does creativity go? If the agent can only use values from a token file, how do you explore new directions?
The short answer: constraints do not prevent exploration. They prevent drift during execution.
There is a difference between exploring a new direction and building production screens. Exploration happens before constraints apply. You sketch, you try wild color palettes, you test a completely different layout structure. None of that requires an agent. That is still your job as a designer.
The constraints kick in when the direction is decided and you need consistent execution across dozens of screens. That is exactly where agents shine and exactly where drift causes the most damage. The governance layer does not say "you can only ever use these colors." It says "when building production UI, these are the valid colors." You can update the tokens file anytime. You can add new values, remove old ones, expand the palette. The system evolves with the product.
Think of it this way: a musician does not stop being creative because they chose to play in a specific key. The key is the constraint that makes the song cohesive. They can change keys between songs. They can modulate mid-song. But within a passage, everyone plays the same notes.
Same principle here. The six layers are not a cage. They are the key signature. Change them when the creative direction changes. Enforce them when consistency matters.
☕ My take
None of this exists as a finished product you can download. That is the honest part. I have been building pieces of it, testing layers in real projects, and writing about what works and what does not. Some of these files I have shipped. Others are still theory I am pressure-testing.
What I know for sure: the teams that will move fastest with AI are not the ones with the best prompts. They are the ones with the best constraints. A well-structured tokens file does more for agent output quality than any amount of prompt engineering. A component registry with valid prop combinations prevents more bugs than a code review.
The designer's role does not shrink in this setup. It changes. You stop being the person who manually builds every screen and start being the person who defines the system that agents build from. You set the constraints. You validate the output. You refine what the agent cannot get right on its own. That is more leverage, not less.
I think we are early. The tooling for bidirectional Figma sync is in beta. The governance layer patterns are not standardized. Most teams have not even started thinking about this. But the window to figure it out before it becomes urgent is closing fast.
If you are a designer reading this and thinking "this sounds like a lot of engineering work," you are right. It is. And that is exactly why designers should be the ones leading it. We are the ones who understand why consistency matters, why systems exist, and why close enough is never good enough. Agents just need us to write that understanding down in a format they can read.
I especially like "Agents follow constraints perfectly. They follow instructions inconsistently."