From COBOL to Python with Claude Code + GSD: what actually works
AI will not magically modernize your COBOL estate. But used correctly, it can dramatically compress discovery, improve consistency, and accelerate validated migration to Python.
There is a version of this story that keeps circulating in engineering circles: point an AI at a mainframe codebase, describe the target state, and walk away while it modernizes decades of business logic into clean Python.
That is not what happens.
But what does happen can be genuinely valuable — if the work is structured correctly.
The real opportunity is not one-shot code conversion. It is using AI to reduce the time it takes to understand legacy systems well enough to migrate them safely.
The problem nobody talks about
COBOL survives because it is trusted, not because it is pleasant to work with.
These systems often encode 30 to 40 years of business decisions, regulatory edge cases, file contracts, numeric precision rules, and workflows that “just work.” In many cases, the original authors are long gone, documentation is incomplete, and the code itself has become the only source of truth.
That is why the hard part of migration has never been syntax translation.
The hard part is reconstructing intent.
That untouched EVALUATE block may be handling a regulatory exception from 1989. A COMP-3 packed decimal field may be carrying business-critical numeric behavior that cannot be approximated or guessed. A fixed-width input record may look simple until one overlooked position offset causes downstream reconciliation issues.
Legacy modernization fails when teams underestimate this reality. The challenge is not converting one language into another. It is preserving behavior that has been embedded in production systems for decades.
What Claude Code actually changes
Used well, Claude Code is not a magic COBOL-to-Python compiler. It is a very strong intent extraction and modernization assistant.
Instead of pasting fragments into a chat window, teams can work directly against real files in a repository: COBOL programs, copybooks, control files, and JCL. That changes the workflow significantly.
Claude Code can help teams:
That alone is a major shift.
What used to require weeks of manual discovery can often be compressed into days. Not because AI eliminates complexity, but because it accelerates one of the slowest parts of the project: getting from “we do not fully understand this system” to “we have a structured picture of what this program actually does.
That compression is where the value begins.
The problem at scale: context rot
A single COBOL program is manageable.
A portfolio of 50, 80, or 100 batch jobs is a different problem entirely.
This is where many AI-assisted migration demos stop being representative of reality. Quality often degrades as work scales. Instructions drift. Naming conventions become inconsistent. Later outputs inherit noise and assumptions from earlier tasks. Program number 40 receives a worse experience than program number 1.
That is the context problem.
For large-scale migrations, context management becomes just as important as code generation.
Why GSD matters
This is where GSD becomes interesting. GSD (“Get S*** Done”) is a spec-driven workflow layer on top of Claude Code that breaks large engineering efforts into bounded, verifiable tasks executed with fresh context.
The real value of GSD is not just orchestration. It is discipline.
By structuring migration work into fresh, bounded, spec-driven units, GSD helps preserve consistency across a large portfolio. Each program can be treated as an atomic migration task with its own scope, inputs, outputs, validation criteria, and completion definition.
That matters because large modernization efforts do not fail only on correctness. They also fail on inconsistency.
If every migration unit is handled with the same structure, the same standards, and the same validation pattern, quality becomes more repeatable. That makes the overall approach far more viable at scale.
What a real migration workflow looks like
In practice, the most effective pattern looks less like wholesale auto-conversion and more like structured, validation-first modernization.
A practical workflow looks something like this:
1. Map the codebase
Start by inventorying the estate:
Before writing Python, understand what exists.
2. Break work into atomic units
Treat each program or business function as its own migration task. Define:
3. Convert with explicit rules
This is where discipline matters. For example:
Recommended by LinkedIn
4. Validate against a golden dataset
This is the most important step in the entire process.
Not “the AI says the code looks correct.”
Not “the Python version seems cleaner.”
Not “the logic appears equivalent.”
Just this:
Does the Python version produce the same result as the COBOL version on the same inputs?
That is the standard that matters.
Run both versions against identical data. Compare outputs. Diff files. Reconcile mismatches. Repeat until behavior matches.
5. Commit independently and traceably
Each migrated unit should be tracked independently. That makes regression analysis easier, supports rollback if needed, and allows teams to scale the effort without losing control.
Where this works especially well
AI-assisted COBOL modernization works best in environments where behavior is bounded, inputs and outputs are clear, and validation can be made objective.
The strongest candidates are:
These are good targets because the logic is usually self-contained and validation is binary. The migration can be treated as a behavior-preservation exercise rather than a broad architectural rewrite.
Where you still need real engineering discipline
There are also areas where teams need to be especially careful.
Financial precision
COBOL numeric behavior is unforgiving. Implied decimals, signed fields, and packed formats do not translate safely into casual Python code. Any migration that uses floating-point arithmetic where decimal precision is required is introducing risk.
Hidden assumptions
Legacy systems often contain magic values, fallback paths, and data-handling exceptions that no one remembers. If AI flags them, treat those flags as investigation points — not as noise.
JCL and operational context
Migrating the application logic is not the same as migrating the system. Scheduling, restarts, dataset dependencies, control flows, and operational procedures often live outside the COBOL source. JCL can be analyzed and documented, but replacing that operational layer is usually a separate workstream.
Performance at production volume
A Python implementation may be logically correct and still fail operationally if it cannot handle production-scale volume. High-throughput batch environments may require chunking, multiprocessing, or distributed execution to meet runtime expectations.
What this actually gives engineering teams
Claude Code with GSD does not eliminate migration risk.
What it does is remove one of the most expensive phases of modernization: the long period where the team is still trying to understand what the system really does.
It shortens the gap between:
That is a meaningful improvement.
And when done well, it does not just help with the first program. It helps maintain quality and consistency across the entire portfolio.
Final take
So, does it make sense to use Claude Code and GSD to migrate COBOL to Python wholesale?
Yes — but only if “wholesale” is defined correctly.
Not as one-shot automation.
Not as blind code translation.
Not as a promise that AI will modernize a mainframe estate by itself.
It makes sense as a system-wide, AI-assisted, validation-first modernization strategy.
That is the real shift.
AI is not replacing migration strategy. It is compressing the time required to understand legacy systems well enough to migrate them safely, consistently, and with far better momentum than most teams have had before.
In most COBOL modernization efforts, that understanding phase is where the real cost lives.
Reducing that cost — while keeping validation non-negotiable — is where this approach starts to become practical.
Bottom line: AI does not eliminate migration risk. But it can dramatically reduce discovery time, improve consistency, and accelerate the path from legacy uncertainty to validated Python equivalents.
If you are working on modernization, I would be interested in what you are seeing in practice — especially around packed decimals, JCL dependencies, context management, and validation at scale.
My connection from another world Gourav J. Shah has been telling me about GSD. Adding to my TODO!