Can your AI rewrite your code in assembly?

Can your AI rewrite your code in assembly?

Suppose you have several strings and you want to count the number of instances of the character ! in your strings. In C++, you might solve the problem as follows if you are an old-school programmer.

size_t c = 0;
for (const auto &str : strings) {
    c += std::count(str.begin(), str.end(), '!');
}
        

You can also get fancier with ranges.

for (const auto &str : strings) {
    c += std::ranges::count(str, '!');
}
        

And so forth.

But what if you want to go faster? Maybe you’d want to rewrite this function in assembly. I decided to do so, and to have fun using both Grok and Claude as my AIs, setting up a friendly competition.

I started with my function and then I asked AIs to optimize it in assembly. Importantly, they knew which machine I was on, so they started to write ARM assembly.

By repeated prompting, I got the following functions.

  • count_classic: Uses C++ standard library std::count for reference.
  • count_assembly: A basic ARM64 assembly loop (byte-by-byte comparison). Written by Grok.
  • count_assembly_claude: Claude’s SIMD-optimized version using NEON instructions (16-byte chunks).
  • count_assembly_grok: Grok’s optimized version (32-byte chunks).
  • count_assembly_claude_2: Claude’s further optimized version (64-byte chunks with multiple accumulators).
  • count_assembly_grok_2: Grok’s latest version (64-byte chunks with improved accumulator handling).
  • count_assembly_claude_3: Claude’s most advanced version with additional optimizations.

You get the idea.

So, how is the performance? I use random strings of up to 1 kilobyte. In all cases, I test that the functions provide the correct count. I did not closely examine the code, so it is possible that mistakes could be hiding in the code.

I record the average number of instructions per string.

Article content


By repeated optimization, I reduced the number of instructions by a factor of eight. The running time decreases similarly.

Can we get the AIs to rewrite the best option in C? Yes, although you need SIMD intrinsics. So there is no benefit to leaving the code in assembly in this instance.

An open question is whether the AIs could find optimizations that are not possible if we use a higher-level language like C or C++. It is an intriguing question that I will seek to answer later. For the time being, the AIs can beat my C++ compiler!

Source code : https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/blob/master/2026/04/02/benchmark/benchmarks/benchmark.cpp

I wouldn’t trust AI to write Asembly sections, but I trust the compiler. Simply due to the deterministic nature and rules based logic. Having said that, I have used AI to repeatedly optimise functions over and over with the help of a profiler. I had a plathora of tests and hand checked every step and verified the claimed improvements.

Sure, it is called a Compiler.

I use AI from time to time to write short Assembly or intrinsics snippets with SIMD instructions from prompts (because a normal average human can’t remember everything in this area), and it works quite well up to a certain limit. Last year I programmed sorting networks and got some impressive hallucinations for large networks, then finally corrected everything by hand without AI.

To view or add a comment, sign in

More articles by Daniel Lemire

  • House prices and fertility

    No, rising house prices are not the driver of sharp fertility declines. The evidence shows only modest, mixed effects…

  • You can beat the binary search

    We sometimes have to look for a value in a sorted array. The simplest algorithm consists in just going through the…

    7 Comments
  • The fastest way to match characters on ARM processors?

    Consider the following problem. Given a string, you must match all of the ASCII white-space characters (\t, \n, \r, and…

    1 Comment
  • A brief history of C/C++ programming languages

    Initially, we had languages like Fortran (1957), Pascal (1970), and C (1972). Fortran was designed for number crunching…

    10 Comments
  • A Fast Immutable Map in Go

    Consider the following problem. You have a large set of strings, maybe millions.

    4 Comments
  • Prefix sums at tens of gigabytes per second with ARM NEON

    Suppose that you have a record of your sales per day. You might want to get a running record where, for each day, you…

    2 Comments
  • You can use newline characters in URLs

    We locate web content using special addresses called URLs. We are all familiar with addresses like https://google.

  • How fast do browsers correct UTF-16 strings?

    JavaScript represents strings using Unicode, like most programming languages today. Each character in a JavaScript…

    1 Comment
  • How bad can Python stop-the-world pauses get?

    When programming, we need to allocate memory, and then deallocate it. If you program in C, you get used to malloc/free…

    3 Comments
  • AI: Igniting the Spark to End Stagnation

    Much of the West has been economically stagnant. Countries like Canada have failed to improve their productivity and…

    2 Comments

Others also viewed

Explore content categories