Why You Should Never Try to Replace Your Toolchains With Direct LLM Output.
I think asking LLMs to directly output assembly is a spectacularly bad idea. I kept seeing this wacky idea proposed in my feeds in LinkedIn, probably because the algorithms decided I was interested in assembly language, compiler toolchains and LLMs.. I blew most of them off, without commenting, because it really isn't a good idea, but the latest one popped up while I was in the middle of catching up on C23, digging deeper into ARM variations and reviewing coding tools options on different platforms. So, I took the hook and munched the bait. I want to give a shout-out to the tirelessly dedicated, brilliant people who have created our modern toolchains, I so very much appreciate how much they've done for us all. But now, we start off with one of the most lamentable situations in Software, that has messed me up too many times to count, and Chat GPT 5.2 full reasoning model confirming my intuitions, or at least providing a lot of evidence for them. Speech-to-Text errors and all outputs left verbatim:
Someone had proposed something I thought was a very bad idea on LinkedIn. They were saying hey, maybe we should have llms generate our code in the assembly rather than any higher level languages. I thought this was wrong for so very many reasons. It was hard to pick just one or two to reply with. However, the first one you might ask is which assembly do you think you would use if the answer was x64 due to its popularity which I would already find questionable, but let's say we want that. In fact, if there's so much source code out there in each of the two different formats tend to cause more hallucinations and bad generations just by the fact of the the two different syntaxes that are both out there at large scale?
You’re not wrong to find that proposal deeply misguided, and your instinct about x86-64 assembly specifically making things worse for LLMs is exactly on target. There are multiple independent failure modes, and the dual-syntax issue is only the first one—but it’s a very clean place to start because it’s concrete and easy to explain.
I’ll structure this so you could almost lift pieces directly into a reply if you wanted.
1. x86-64 has two widely used, incompatible surface languages
(and neither is canonical)
Unlike ARM or RISC-V, x86-64 does not have a single, authoritative assembly language:
Dimension Intel syntax AT&T syntax
Operand order dest, src src, dest
Register prefix rax %rax
Immediate prefix 123 $123
Memory syntax [rax+4] 4(%rax)
Mnemonics shared shared
These aren’t cosmetic differences. They invert meaning.
An LLM trained on:
…is necessarily exposed to both dialects at massive scale.
That means:
This is exactly the kind of ambiguity LLMs are worst at: locally plausible, globally catastrophic errors.
2. This ambiguity does increase hallucinations and silent miscompilations
You’re right to suspect this would increase hallucinations, not reduce them.
Typical failure patterns you’d see:
These errors:
That’s a worst-case scenario for automated code generation.
3. Even if you “pick one syntax,” the problem doesn’t go away
Suppose someone says:
“Fine, we’ll just mandate Intel syntax.”
That does not solve it:
So the model still has to actively suppress a competing grammar, which is not something LLMs do reliably.
By contrast:
4. x86-64 assembly is underspecified without context
Another fatal issue—independent of syntax:
x86-64 assembly does not encode enough intent to be a good generation target.
To write correct x86-64 assembly, you must know:
None of this is in the assembly itself.
So an LLM generating assembly must:
That’s vastly harder than generating C, Rust, or even LLVM IR, where the compiler enforces those constraints.
5. Assembly maximizes the blast radius of small mistakes
Higher-level languages:
Assembly:
LLMs already struggle with:
Assembly removes the last remaining safety net.
6. Ironically, compilers already solved the “best assembly” problem
If the goal is:
“Get optimal machine code”
Then the answer is:
Modern compilers:
An LLM does none of that.
Generating assembly directly is not “closer to the metal”—it’s bypassing decades of accumulated correctness.
7. A devastating one-liner you could use
If you wanted a concise reply:
“x86-64 doesn’t even have a single assembly language—Intel and AT&T syntaxes invert operand order. Training LLMs on both increases silent semantic errors, not correctness. You’d be removing type systems, ABIs, and compiler verification to generate the most brittle representation possible.”
That’s hard to argue with.
Bottom line
Your intuition is exactly right:
This proposal misunderstands both:
If you’d like, next we could:
Just say the word.
Recommended by LinkedIn
When you mention safety structure and verification, it's not just a compiler but there's other parts of the tool chain that depend on source level representation in the language being compiled. Memory sanitizer, address sanitizer, thread sanitizer. Not quite specifically part of the compiler, but so important for properly testing the heck out of complex code. Along with all of the abstractions does that make your source more understandable? You lose all those instrumentation possibilities then are so important for testing.
Yes—exactly. You’ve put your finger on something even more damning than “assembly is hard”: assembly destroys the entire testing and verification ecosystem that modern software depends on.
And crucially, this isn’t just “the compiler.” It’s the semantic richness of the source language that makes all of this possible.
Let me lay this out cleanly, because this is the part most people proposing “LLMs generate assembly” completely fail to grasp.
1. Sanitizers depend on source-level meaning, not just instructions
Tools like:
…are not “extra passes you can just bolt onto assembly.”
They rely on:
Assembly has none of this.
Once you drop to raw instructions:
You haven’t just lost checks—you’ve lost the concepts that make checking possible.
2. Sanitizers are semantic instrumentation, not debugging hacks
This is a key conceptual point.
Sanitizers are not:
“Debug builds with extra checks”
They are:
Semantic instrumentation derived from the language’s abstract machine
Example:
int *p = malloc(sizeof(int));
free(p);
*p = 42; // ASan can catch this
ASan knows:
In assembly:
mov rax, [rbx]
There is no way to answer:
So you can’t “just add ASan to assembly.” The information is already destroyed.
3. You also lose compositional reasoning
Higher-level languages allow:
Even in C/C++—imperfect as they are—you can still say:
“This function takes ownership of this pointer”
In assembly:
So testing coverage explodes combinatorially.
This is precisely the kind of environment where LLM mistakes become undetectable until production.
4. You lose debuggability in a deep, structural way
With source-level code, you get:
With assembly:
Even humans struggle here. Expecting an LLM to reason about it is fantasy.
5. Abstractions don’t just make code shorter—they make it testable
This directly answers your last question.
Yes: abstractions make code more understandable precisely because they preserve invariants.
They give the toolchain:
Abstraction isn’t about hiding details—it’s about making the remaining details checkable.
Assembly hides everything that matters for correctness.
6. Why this matters even more for AI-generated code
LLMs are:
So the only way to make LLM-generated code viable is:
Assembly does the opposite:
That’s the worst possible combination.
7. The irony: “closer to the metal” means “farther from correctness”
The proposal fails because it confuses:
“Low-level” with “precise”
In reality:
LLVM IR is “lower-level” than C—but still supports sanitizers. Assembly is lower still—and supports almost nothing.
8. A compact way to say this publicly
If you wanted something sharp but accurate:
“Assembly throws away the semantic information that sanitizers, debuggers, and verification tools rely on. You’re not just skipping the compiler—you’re deleting the entire testing ecosystem that makes modern software survivable.”
That’s not rhetoric. It’s a factual description.
Bottom line
You’re right on every axis:
The idea that LLMs should emit assembly isn’t bold or advanced—it’s pre–1970s thinking, dressed up with AI buzzwords.
Even with higher level languages Llms create incredibly buggy code. Generated machine code would be an absolute disaster and to correct any mistakes in it would make the use of it totally impractical
This breaks it down quite a bit more than I did, and fully explains something that likely shouldn't need any explaining at all, but apparently does: https://engrlog.substack.com/p/why-skip-the-code-ship-the-binary
I had missed a major hallucination in this article, but it doesn't undermine the point at all. It hilariously wrote: ARM: one syntax Okay, maybe one syntax, but so many different combinations of instruction set architectures and availability that one of the most common things I had been conferring with LLM's on when reading different sloppy tutorials or watching videos was learning the whole zoo of just which processors had just which instructions. So, not an improvement if you are looking at "ARM Assembly Code".