Stack Smackdown: Battle of the backends (A Kiro project)

After learning about Kiro from AWS re:Invent, I was excited to give it a try by building a sample project that was practical. Having built a career in the corporate world by learning from open source projects, I also wanted to contribute something meaningful back to the developer community while I have some time on my hands being in between jobs right now. Right off the bat I must say, I'm not sure I'm ever going back to any other IDE unless I have to. I believe Spec Driven Development (SDD), the development paradigm Kiro forces you into, is truly the next big shift in any developer's journey. It doesn't replace known patterns that are tried and true, such as MVC or Domain Driven Development (DDD); but rather it is complimentary to them, a style of development built for the age of Artificial Intelligence. While other IDEs support a similar style via plugins for Gemini, Amazon Q, etc, to me it feels nicer to work from a fully fledged AI-first next generation IDE dedicated to this style of programming.

How Kiro Stands Out

For one, Kiro has an adorable mascot. It is an animated little ghost figure, reminiscent of some early video games from my childhood.

However what makes Kiro revolutionary:

SDD is handled out of the box, and context is preserved across conversations with spec files and steering files.
You can give Kiro powers to integrate MCP in a much more sophisticated, more lazily loaded, context-aware way so that you don't burn through tokens.
You can add dedicated agent hooks to trigger custom side-effects when certain events occur in your development workflow, such as automatically adding basic unit tests or updating documentation whenever a new component is added.
It's very easy to retrofit this new style of development into existing legacy projects

With the right setup you get a comprehensive end-to-end workflow that really makes a dev's life easier. From the little so far I've worked with Kiro, it's a remarkable and delightful experience that is almost downright addictive. And it's so optimized I was able to build my Stack Smackdown project by using about 1,000 credits (the first 500 were free).

What is Stack Smackdown?

For as long as I can remember in my journey as a software engineer and architect, I've always had a burning question on my mind,

"What is the best language or framework for a backend web service?"

This is the question I try to answer with my Stack Smackdown project.

As many developers know, the answer to this question is different depending on who you ask. It is very subjective and more or less a matter of taste. I've come to realize that although personally I am a huge fan of Kotlin, much of my opinion about what makes Kotlin great rests on the fact that my career sort of took me there by happenstance:

At university, I started with C/C++ and disliked it very much. It felt clunky, was hard to read, and the idea of playing with pointers felt unnecessary when I all I really needed was for my program to simply just do something.
Next I learned Java, and loved it much more. No pointers, automatic garbage collection, code looks more human, reads like a book and was easier to reason about what exactly any code snippet is doing.
Because I liked Java, I gravitated toward Android when I entered the corporate world as it was Java-based development back then. So naturally when Kotlin became the status quo for Android development I became impressed by concepts like null safety, getting rid of POJOs, more functional programming, and lots of other things Kotlin offers out of the box.

However, it's only much later that I realized much of what I loved about Java and Kotlin can be found in a plethora of other languages. For example Scala also has null safety and is interoperable with Java, that is Scala code can use existing Java libraries. So I imagine other developers' opinion on the best language is a bit like mine in the sense that it's very much rooted in the way their journey simply took them there after one or two other languages just didn't "feel" right. So with this AI-assisted SDD project using Kiro, I sought out a more concrete and empirical way to answer this question. Most importantly I wanted an answer that actually matters in this day and age.

Syntax is Irrelevant

Only conciseness, readability, and performance matter anymore.

When looking at today's landscape, everything runs on the cloud by renting from major providers and platforms like Amazon/AWS, Google/GCP, and Microsoft/Azure. At this point I now understand the most important characteristic of any backend server framework is its overhead in terms of performance and cost effectiveness. Every language, every framework today, provides essentially all the same things only with different syntax. And with the advent of AI, memorizing syntax is becoming less important each day (you still have to learn enough to read the code AI outputs though , debug and give it some direction). Any web service can now be written in any language, can be containerized with Docker and deployed to any cloud environment. So it has dawned on me, what matters most now is the footprint of each language while performing identical work. Container image size impacts what you pay for container registries. Lower RAM and CPU utilization means you can select smaller instances and run identical workloads for cheaper. Code that is more verbose burns through tokens faster in your AI assisted coding.

By teaming up with Kiro, I was able to easily venture off into other languages and frameworks, and begin to understand more deeply which language/framework/runtime gives me more bang for my buck.

The Technical Stack: 11 Frameworks

Stack Smackdown puts 11 languages / web frameworks through a comprehensive performance comparison. This isn't just about picking favorites. It's about understanding the real-world trade-offs that affect your cloud bill.

Native Compilation (Direct to Machine Code):

Go (Gin) - Google's minimalist framework with excellent concurrency
Rust (Actix-web) - Zero-cost abstractions with actor-based architecture
Dart (Shelf AOT) - Google's framework compiled ahead-of-time to native code

JVM Ecosystem (Bytecode → JIT Compilation):

Kotlin (Ktor) - Asynchronous framework leveraging coroutines
Java (SpringBoot) - The enterprise standard with comprehensive ecosystem
Scala (Akka HTTP) - Functional programming meets the actor model

Other Runtimes:

C# (ASP.NET Core) - Microsoft's high-performance enterprise framework
Node.js (Express.js) - JavaScript's V8 engine with event-driven architecture
Python (Django) - The "batteries included" framework
Ruby (Sinatra) - Lightweight DSL for web applications
PHP (Laravel) - Full-featured framework with elegant syntax

Current State: Production-Ready Benchmarking Platform

The platform is fully operational with comprehensive monitoring infrastructure:

All 11 backend services running in Docker containers
Prometheus + Grafana monitoring with real-time metrics collection
Automated deployment with one-command setup
Build metrics collection capturing performance data across all frameworks
Container metrics via cAdvisor for resource utilization analysis

Each service implements identical MVC architecture. For simplicity the services currently host a single /health endpoint that only Docker Compose hits for health checks, ensuring a fair baseline comparison across all frameworks while they are mostly idle, just returning a simple json response to a single client every few seconds. In the future I'd like to add some more endpoints for various computational tasks, to compare other things like concurrency, cpu-bound vs i/o-bound tasks, and load.

The beauty of this setup is its simplicity - you only need Docker Desktop to run the entire platform locally and start comparing performance immediately.

Performance Dimensions: What Really Matters

1. Docker Image Size

Why it matters: Larger images mean higher storage costs in container registries, slower deployment times, and increased network transfer costs.

What I measured: Final container image size after optimization, ranging from ~150MB for native binaries to ~300MB+ for full-featured frameworks.

2. Memory Usage (RAM)

Why it matters: Memory is often the most expensive cloud resource. Lower RAM usage means you can run more services per instance or choose smaller, cheaper instance types.

What I measured: cAdvisor's container_memory_usage_bytes metric, which represents the total memory usage from the container's cgroup. This includes:

RSS (Resident Set Size) - Physical memory allocated by the application (heap, stack, etc.)
Cache memory - File system cache used by the container
Swap memory - Memory that has been swapped to disk
Kernel memory - Memory used by the kernel on behalf of the container (on modern kernels)

This gives us a comprehensive view of the container's total memory footprint, not just the application's heap usage. It's the same metric that Kubernetes uses for memory-based pod eviction decisions, making it highly relevant for real-world deployment scenarios.

3. CPU Utilization

Why it matters: CPU efficiency directly impacts your ability to handle concurrent requests and determines how many services you can co-locate on a single instance.

What I measured: cAdvisor's container_cpu_usage_seconds_total metric, processed through a Prometheus rate function. This represents:

Cumulative CPU time consumed by the container across all CPU cores
Rate calculation over a 5-minute window: rate(container_cpu_usage_seconds_total[5m]) * 100
Multi-core aware - A container using 2.5 CPU cores will show 250% utilization

This metric comes directly from Linux cgroups (/sys/fs/cgroup/cpu/cpu.usage_us) and measures actual CPU seconds consumed by all processes within the container. It's not measuring different concurrency models (threads vs coroutines vs event loops) - it simply measures total CPU time used regardless of how the application achieves that work.

Important note: This is raw CPU utilization at the container level. A single-threaded application maxing out one core will show 100% utilization, while a multi-threaded application using 2.5 cores will show 250% utilization. The metric doesn't distinguish between different programming paradigms - it just measures total CPU consumption.

4. Build Time

Why it matters: Faster builds mean shorter CI/CD pipelines, quicker developer feedback loops, and reduced infrastructure costs for build systems.

What I measured: Complete build time from source to deployable artifact, including dependency resolution and compilation.

5. Cyclomatic Complexity

Why it matters: This measures code maintainability and testing requirements. Lower complexity means easier debugging, fewer bugs, and reduced long-term maintenance costs.

What I measured: Decision points in the codebase (if statements, loops, switch cases) to assess how much framework boilerplate is required for identical functionality.

6. Application Size (Character Count)

Why it matters: Source code size directly impacts development velocity, maintainability, and AI-assisted development costs. Smaller codebases are easier to understand, debug, and modify. In the era of AI coding assistants, character count also correlates with token usage and API costs for code analysis tools.

What I measured: Total character count across all source code files for each service implementation, measured in bytes. This includes:

Language-specific files: .kt, .java, .go, .rs, .js/.ts, .py, .rb, .cs, .php, .dart, .scala
Application code only: Excludes dependencies, build files, and generated code
Consistent scope: Only the MVC implementation (controllers, models, main files) for fair comparison

This metric reveals framework overhead and language verbosity. A framework requiring 2,000 characters to implement the same functionality as another framework's 500 characters indicates higher development and maintenance costs. It also helps estimate AI development costs, as most AI coding tools charge based on token usage (roughly 4 characters per token).

Practical implications:

Development speed: Less code to write and review
Maintenance burden: Fewer lines to debug and update
AI tooling costs: Lower token usage for code analysis and generation
Framework efficiency: How much boilerplate is required for basic functionality

This dimension complements cyclomatic complexity by measuring not just code complexity, but total code volume required to achieve identical functionality across all 11 frameworks.

The Results: Data-Driven Insights

The comprehensive benchmarking reveals clear performance patterns that challenge conventional wisdom about backend framework selection. Here's what the data tells us:

Native Compilation Dominates Efficiency Metrics

Rust (Actix-web) emerges as the efficiency champion across multiple dimensions:

Lowest memory usage: 3.19 MB (67x less than Scala)
Smallest Docker image: 167 MB
Ultra-fast builds: 1.04 seconds
Minimal CPU overhead: 0.445% utilization

Go (Gin) follows closely with excellent resource efficiency:

Low memory footprint: 7.07 MB
Compact images: 195 MB
Reasonable build time: 12.2 seconds (though surprisingly the slowest)
Efficient CPU usage: 0.203%

Dart (Shelf AOT) proves Google's server-side vision with solid performance:

Very low memory: 5.95 MB
Compact deployment: 177 MB
Quick builds: 1.80 seconds
Low CPU overhead: 0.252%

JVM Ecosystem: The Memory-Hungry Powerhouses

The JVM frameworks show a consistent pattern - excellent tooling and ecosystem, but at a significant resource cost:

Scala (Akka HTTP) represents the extreme end:

Highest memory usage: 214 MB (67x more than Rust)
Highest CPU utilization: 0.826%
Largest codebase: 3,545 characters (2x more than PHP)
Moderate complexity: 7 cyclomatic complexity

Java (SpringBoot) shows enterprise framework overhead:

High memory usage: 202 MB
Large Docker images: 422 MB
Highest complexity: 14 cyclomatic complexity (tied with Go)
Reasonable build time: 2 seconds

Kotlin (Ktor) offers a middle ground in the JVM space:

Moderate memory: 157 MB
Balanced complexity: 5 cyclomatic complexity
Longer builds: 5 seconds

.NET Runtime: Enterprise Efficiency

C# (ASP.NET Core) delivers Microsoft's enterprise-grade performance:

Excellent memory efficiency: 25.0 MB (8x less than JVM frameworks)
Low complexity: 4 cyclomatic complexity
Quick builds: 3.37 seconds
Minimal CPU usage: 0.165%
Moderate Docker images: 351 MB

Interpreted Languages: Surprising Efficiency

Contrary to expectations, interpreted languages show competitive resource usage.

Node.js (Express) delivers impressive efficiency:

Lowest CPU usage: 0.150%
Minimal complexity: 1 cyclomatic complexity
Compact code: 1,815 characters
Moderate memory: 43.1 MB

Python (Django) balances features with efficiency:

Reasonable memory: 104 MB
Low complexity: 4 cyclomatic complexity
Quick builds: 1.56 seconds

PHP (Laravel) shows surprising optimization:

Smallest codebase: 1,716 characters
Low complexity: 4 cyclomatic complexity
Moderate memory: 19.9 MB
Largest Docker images: 726 MB (framework overhead)

Ruby (Sinatra) demonstrates minimalist framework benefits:

Fastest builds: 1 second (tied with Rust)
Largest codebase: 2,995 characters
Moderate complexity: 9 cyclomatic complexity
Higher memory usage: 60.4 MB
Compact Docker images: 288 MB

The Development Experience Trade-offs

Code Verbosity vs Performance: There's an inverse relationship between code size and runtime efficiency. Rust requires more characters (2,178) but delivers exceptional performance, while PHP achieves the same functionality with minimal code (1,716 characters) but higher deployment overhead.

Build Time Surprises: Go's 12.2-second build time stands out as unexpectedly slow for a compiled language, while Rust compiles to native code in just 1.04 seconds. This suggests Go's build process includes more comprehensive optimization or dependency resolution.

Complexity Patterns: Node.js achieves the lowest complexity (1) through JavaScript's event-driven model, while enterprise frameworks like SpringBoot and Go require higher complexity (14) for the same functionality.

Cloud Cost Implications

Based on typical cloud pricing models:

Most Cost-Effective: Rust, Dart, and Go offer the best resource efficiency, potentially reducing cloud costs by 60-80% compared to JVM frameworks.

Enterprise Sweet Spot: .NET Core (ASP.NET) provides a balanced approach with 25 MB memory usage and enterprise features, making it cost-effective for Microsoft-centric environments.

Development Velocity Leaders: Node.js and PHP minimize code complexity and build times, reducing development costs even if runtime costs are higher.

The data reveals that framework choice significantly impacts both development velocity and operational costs, with native compilation languages offering the best resource efficiency for high-scale deployments.

Testing Environment

Note: results are based on local testing using MacBook Pro M4 (12-core) with Docker containers. Performance characteristics may vary across different hardware and cloud environments.

Experience the Data Yourself

Thank you for joining me on this performance journey! I hope these empirical insights help you make more informed technology decisions - whether you're optimizing cloud costs or choosing your next project's stack.

Ready to explore? The complete Stack Smackdown platform is open source on GitHub. You can:

Deploy all 11 services locally in under 5 minutes
Access live Grafana dashboards with real performance metrics
Use the methodology for your own framework comparisons

Found value in this analysis? A GitHub star ⭐ helps other developers discover these performance insights.

Built with Kiro: This entire project showcases Spec Driven Development in action - demonstrating how AI-first development tools can accelerate complex technical projects while maintaining rigorous engineering standards.

Stack Smackdown: Battle of the backends (A Kiro project)

Michael Summers

How Kiro Stands Out

What is Stack Smackdown?

Syntax is Irrelevant

The Technical Stack: 11 Frameworks

Native Compilation (Direct to Machine Code):

JVM Ecosystem (Bytecode → JIT Compilation):

Other Runtimes:

Current State: Production-Ready Benchmarking Platform

Performance Dimensions: What Really Matters

Recommended by LinkedIn

The Results: Data-Driven Insights

Native Compilation Dominates Efficiency Metrics

JVM Ecosystem: The Memory-Hungry Powerhouses

.NET Runtime: Enterprise Efficiency

Interpreted Languages: Surprising Efficiency

The Development Experience Trade-offs

Cloud Cost Implications

Testing Environment

Experience the Data Yourself

More articles by Michael Summers

Others also viewed

🚀 Vibe-Coding with AI: Rebuilding My Game Using GitHub Copilot Agent Mode

Claude Code: Your Terminal-Based Coding Sidekick That’s Smarter Than Your Last Intern

SBCs Hit a New Gear, Safer Local Agents, and No‑GIL Reality Checks

Behind the build: How we use Cursor for speed optimisation

Claude Code Isn't Just for Developers: 5 Workflows That Will Save You Hours This Week

Code, Coached: AI Generators Reimagining .NET Development

How to tame the (agent) beast

Raptor Mini: GitHub Copilot’s New Code-First AI Model Developers Shouldn’t Ignore

Building Stock Adventure with Claude Code: What I Learned About AI as a Development Partner

Explore content categories

How Kiro Stands Out

What is Stack Smackdown?

Syntax is Irrelevant

The Technical Stack: 11 Frameworks

Native Compilation (Direct to Machine Code):

JVM Ecosystem (Bytecode → JIT Compilation):

Other Runtimes:

Current State: Production-Ready Benchmarking Platform

Performance Dimensions: What Really Matters

Recommended by LinkedIn

The Results: Data-Driven Insights

Native Compilation Dominates Efficiency Metrics

JVM Ecosystem: The Memory-Hungry Powerhouses

.NET Runtime: Enterprise Efficiency

Interpreted Languages: Surprising Efficiency

The Development Experience Trade-offs

Cloud Cost Implications

Testing Environment

Experience the Data Yourself

More articles by Michael Summers

Stack Smackdown Round 2: Load Test and New Challengers

Your AI Coding Assistant Isn't Sentient. It's a Calculator.

Others also viewed

🚀 Vibe-Coding with AI: Rebuilding My Game Using GitHub Copilot Agent Mode

Claude Code: Your Terminal-Based Coding Sidekick That’s Smarter Than Your Last Intern

SBCs Hit a New Gear, Safer Local Agents, and No‑GIL Reality Checks

Behind the build: How we use Cursor for speed optimisation

Claude Code Isn't Just for Developers: 5 Workflows That Will Save You Hours This Week

Code, Coached: AI Generators Reimagining .NET Development

How to tame the (agent) beast

Raptor Mini: GitHub Copilot’s New Code-First AI Model Developers Shouldn’t Ignore

Building Stock Adventure with Claude Code: What I Learned About AI as a Development Partner

Similar topics

The Future of Coding in an AI-Driven Environment

How to Drive Hypergrowth With AI-Powered Developer Tools

Explore content categories