Boosting LLM Performance with Rust: LiteLLM 3x Faster

1mo

I made LiteLLM 3x faster. With one line of code. I was running LiteLLM (YC W23) in production, simulating and handling thousands of requests per second. The Python overhead was killing us. Connection pooling was the bottleneck. Rate limiting was eating CPU cycles. So we did something unconventional: We rewrote the hot paths in Rust. The results? - 3.2x faster connection pooling - 1.6x faster rate limiting - 42x more memory efficient for high-cardinality workloads - Zero code changes required Here's how you use it: import fast_litellm # That's it. One line. import litellm # Everything just works, but faster No configuration. No migration. No breaking changes. Just add fast-litellm to your requirements.txt and you're done. The secret? PyO3 + DashMap for lock-free concurrency. We kept the Python API you love but replaced the internals with Rust where it matters. What we learned: 1. Not everything needs to be rewritten in Rust 2. FFI overhead is real - small operations don't benefit 3. The biggest wins are in concurrent data structures 4. Production safety matters - we built in automatic fallback I am open-sourcing everything. MIT licensed. Works on Linux, macOS, Windows. Python 3.8-3.13. Link in comments. --- Building something that needs LLM performance at scale? Let's connect. #OpenSource #Rust #Python #LLM #Performance #AI #MachineLearning #SoftwareEngineering

17 Comments

Dipankar Sarkar 1mo

Here is the repo! https://github.com/neul-labs/fast-litellm

Rahil Bhansali 1mo

Trying this today on one of my projects 🙏🙏

1 Reaction

Mika Tammi 1mo

I have wondered why haven't someone already completely rewritten LiteLLM in Rust. I mean for simple queries it starts to consume 10-20% of CPU and it is just routing requests. You could probably achieve same things with NGINX and just right configuration

1 Reaction

Abhishek Kumar 1mo

Really impressive results. Optimizing connection pooling and rate limiting at this level can make a huge difference at scale.

1 Reaction

Lokesh Chauhan 1mo

Looks interesting. Will try to understand if python overhead can be better mitigated by design improvements.

2 Reactions

Devayush Rout 1mo

most LLM infra bottlenecks end up being networking and concurrency, not the model itself

1 Reaction

Samarth Agrawal 1mo

so amazing Dipankar Sarkar

1 Reaction

Ryan Hunt 1mo

> one line of code > rewrote the hot paths in rust Well, you got me! 😄

1 Reaction

Parag Arora 1mo

The "42x memory efficient" number is wild. I think the underrated insight here is your point #1 - not everything needs Rust. Most teams reach for a full rewrite when targeted FFI would've done it. Would love to see a breakdown of where the FFI overhead actually started hurting you.

1 Reaction

See more comments

To view or add a comment, sign in

More Relevant Posts

Bruno Verachten
1mo
Report this post
Hugging Face tokenizers now has riscv64 wheel support merged upstream, thanks to BayLibre's work on PR #1951. Once the next release ships, anyone on RISC-V can just pip install tokenizers without building from source. That's one of the most requested ML packages ticked off the list. On the WebAssembly front, two more PRs landed in Ocre Runtime this week (the Cloud Native Computing Foundation (CNCF) edge container project). A wasi-sysroot refactor and a follow-up fix, both tested on a banana pi Open Source Project F3. WebAssembly on RISC-V is getting real CI coverage now. 34 releases went out across my forks. The Docker stack hit v29.3.1 with containerd 2.2.1 and runc 1.4.0. DuckDB v1.5.1 was built on a native RISE (RISC-V Software Ecosystem) riscv64 runner on Scaleway. And Mistral AI Vibe v2.7.0 brings the Mistral AI coding CLI to riscv64 with a new conversation rewind feature. The riscv64 Python wheel index keeps growing: 50+ packages now, including PyTorch 2.10, cryptography 47.0, tiktoken 0.12, pydantic-core, numpy 2.5, and llama-cpp-python. All built natively on a BananaPi F3. OpenSCAD pushed 6 daily builds with .deb and .rpm packages for riscv64, arm64, and amd64. SDKMAN hit a snag: two PRs for riscv64 state support were closed due to an ongoing backend migration. Tracking the issue for when that settles. What's your experience with pip install on RISC-V? Curious how many packages just work for you out of the box. #RISCV #RISCVEverywhere #OpenSource #Python #Docker #EdgeAI #WebAssembly #DockerCaptain #devEco
Like Comment
To view or add a comment, sign in
geddy wendot
1mo
Report this post
Headline: AI Agents on 4GB RAM? I’ve spent the last few days pushing the limits of my 7-year-old laptop (Intel i5-7200U, 4GB RAM). Running Fedora 43 and Python 3.14, I realized that modern AI frameworks are simply too heavy for older hardware in that, they leak memory, crash on new Python versions, and struggle with small models. So, I built "NOEL." (Native Operations & Execution Layer) NOEL is a lightweight, "Direct-to-Shell" AI Assistant. Instead of relying on bloated Python agents, it uses a custom Bash bridge to communicate directly with local LLMs via Ollama. Key Project Highlights: Bypassed Python 3.14 compatibility issues with a custom metadata shim. Optimized for the 1.5B Qwen2.5-Coder model to ensure fast inference on 2 CPU cores. Native Fedora integration: Noel handles journalctl logs and DNF package management seamlessly. 100% Private & Local: No data leaves my SSD. This project proves that the "Small Language Model" (SLM) revolution isn't just for high-end servers—it's for anyone with a terminal and a curious mind. Plans to do a test run to convert it to a full personal assistant AI is considered when i get a hold of better hardware resources. So at the moment we remain on the testing and breaking stage. Any support is appreciated 😁. Check out the full technical breakdown and the source code on my GitHub! https://lnkd.in/dyM37XhD #AI #Linux #Fedora #OpenSource #Ollama #EdgeComputing #Python #HardwareHacking #SelfHosted

GitHub - Geddywendot/NOEL-AI-Agent: NOEL(Native Operations & Execution Layer) is a lightweight, local AI Shell Assistant optimized for legacy hardware (4GB RAM) and Python 3.14+ compatibility. github.com
Like Comment
To view or add a comment, sign in
Vinayak Dev
1mo
Report this post
I'm making a pivot While developing devblas, the BLAS project I started working on a while ago, I hit a wall: managing manual memory safety in C++ alongside architecture, threading, and SIMD is a massive cognitive tax. One author cannot juggle hardware performance and safety guarantees without something breaking. So, I moved the project to Rust - not for the trend, but for the Borrow Checker. To build a first-principles mental model of how the Rust lifetime and ownership model works, I wrote a deep dive on Lifetime Markers. In this post, I: - Force a classic use-after-free hazard to prove where the safety nets end. - Use a PhantomData marker to inform the compiler of hidden lifetimes and force a compilation failure. If you work in C/C++ and live in pointers, this breakdown of PhantomData and pointer provenance is for you. Read the full technical breakdown here: https://lnkd.in/gJZ4NRSz Updates: - devblas is now a Rust-first project to accelerate development without sacrificing reliability. - My old blog is now deprecated. I’ve moved my technical writing to Hashnode for a superior code-block experience. Criticism is welcome. Happy learning! #RustLang #SystemsProgramming #HPC #Cpp #SoftwareEngineering #MemorySafety

Mastering Rust Lifetimes vinayakdsci.hashnode.dev
Like Comment
To view or add a comment, sign in
João Bosco ( JB ) Mesquita
1mo
Report this post
Are you hitting a wall with n8n for complex automations? You're not alone. Many developers are strategically shifting to Python for core logic in business-critical workflows, using n8n for orchestration. Here’s why some devs still reach for code: - N8n excels at rapid prototyping for simple API calls and basic flows. It's fantastic for connecting things quickly. - But a senior dev (Feb 17, 2026) said for "complex automations that need solid debugging and are truly scalable," they "will opt for code." - Another user switched to Python for core logic for "business-critical (especially around performance, file handling, and AI logic)" projects, keeping n8n as the orchestrator. - When you need precise control, memory management, or deep custom algorithms, Python gives you that. N8n orchestrates, Python executes. Don't force n8n into every corner. Use it for what it's great at. For the rest, orchestrate with n8n and build with Python. When do you know it's time to move core logic out of n8n and into code? #n8n #Python #Automation

4 Comments
Like Comment
To view or add a comment, sign in
Samir Saiyed
1mo
Report this post
MCP Architecture in 60 seconds: 🖥️ MCP Host — the AI app (Claude Desktop, Cursor) 🔌 MCP Client — the middleman inside the host ⚙️ MCP Server — exposes tools, resources & prompts Data layer → JSON-RPC 2.0 Transport → STDIO (local) or HTTP+SSE (remote) Primitives → Tools, Resources, Prompts That’s the entire architecture. Part 2 of my MCP Mastery Series walks through each piece with simple analogies and visuals. Swipe 👇 | Follow for Parts 3–7 coming weekly. Krish Naik Sunny Savita Nitish Singh Boktiar Ahmed Bappy Dr. Anil Pise Mayank Aggarwal sudhanshu kumar #MCP #ModelContextProtocol #AIEngineering #AIAgents #Python #MachineLearning #LLMOps
Like Comment
To view or add a comment, sign in
Edward Ojambo
1mo
Report this post
The future of media production is local compute. Professional environments require precise hardware memory management today. Scaling AI infrastructure starts with optimized Python pipelines. Eliminate third party dependencies to ensure data security. This technical approach maximizes return on hardware investment. Lead your industry with high performance local intelligence. Blog: https://lnkd.in/eUU43jcg Video: https://lnkd.in/ezCAcpe6 Books: https://lnkd.in/gqejw-pc Blueprints: https://ojamboshop.com Tutorials: https://ojambo.com/contact Consultations: https://lnkd.in/eWRXWP6E #ArtificialIntelligence #DataSovereignty #SoftwareArchitecture
Like Comment
To view or add a comment, sign in
Unitary Foundation

6,196 followers
1mo
Report this post
Benchmarking should be a public good. Today, we are releasing a major update to Metriq, our platform for open, community-driven quantum computer benchmarking. 📰 And we've put out a paper describing the platform! Check it out: https://lnkd.in/es8jnYZn As the field moves toward quantum advantage, Metriq provides a shared, reproducible record of performance across the diverse hardware landscape. This release introduces a new collaborative workflow: 🔹 metriq-gym: An open-source Python toolkit to run benchmarks across providers. 🔹 metriq-data: A public, versioned dataset of results. 🔹 metriq-web: Interactive dashboards to track performance over time. Join us in the effort: Run benchmarks, peer-review data, or propose new suites via open RFCs. See you on GitHub! ⭐️ More details in the comments. 👇 #QuantumComputing #OpenSource #Benchmarking #Metriq #UnitaryFoundation
3 Comments
Like Comment
To view or add a comment, sign in
William Zeng
1mo
Report this post
Today we are releasing a major update to Metriq, the platform for open, community-driven quantum computer benchmarking. It's been built by Unitary Foundation with a lot of help from the external community of open source contributors. Blogpost: https://lnkd.in/ePiPx7sX This release introduces metriq-gym, a new open-source toolkit for defining and running benchmarks across hardware providers, metriq-data, a public dataset of benchmark results, along with a new Metriq website, where results can be tracked and shared: http://metriq.info/ We invite the quantum community to suggest improvements, extend the benchmark suite, run experiments, and upload new results. As quantum computers evolve over time, the Metriq platform will evolve with them. Check out our new paper describing the platform, and see you on GitHub!
Unitary Foundation

6,196 followers
1mo

Benchmarking should be a public good. Today, we are releasing a major update to Metriq, our platform for open, community-driven quantum computer benchmarking. 📰 And we've put out a paper describing the platform! Check it out: https://lnkd.in/es8jnYZn As the field moves toward quantum advantage, Metriq provides a shared, reproducible record of performance across the diverse hardware landscape. This release introduces a new collaborative workflow: 🔹 metriq-gym: An open-source Python toolkit to run benchmarks across providers. 🔹 metriq-data: A public, versioned dataset of results. 🔹 metriq-web: Interactive dashboards to track performance over time. Join us in the effort: Run benchmarks, peer-review data, or propose new suites via open RFCs. See you on GitHub! ⭐️ More details in the comments. 👇 #QuantumComputing #OpenSource #Benchmarking #Metriq #UnitaryFoundation
1 Comment
Like Comment
To view or add a comment, sign in
Dejan Petrovic
1mo
Report this post
I have spent some time recently moving beyond simple RAG proofs-of-concept to build something that can actually hold up in a production environment, and I’ve decided to open-source the result. Building a basic Retrieval-Augmented Generation pipeline is incredibly accessible these days, but ensuring that the system is secure, observable, and genuinely resistant to hallucinations requires a completely different level of architectural rigor. In my latest repository, I tackled the complexities of enterprise-grade security by implementing JWT-based Role-Based Access Control that enforces document-level clearance directly at the vector database layer. To significantly improve the quality of retrieval and minimize the risk of the model inventing facts, the system utilizes a combination of query transformation, hybrid search via Qdrant, and a cross-encoder for critically reranking the candidate documents. I also focused heavily on operational stability, integrating Redis for semantic caching to keep response times manageable, and fully instrumenting the pipeline with OpenTelemetry and Arize Phoenix so that every single step is transparent and traceable. If you are interested in seeing how all of these components fit together into a modular, testable pipeline built with Python, FastAPI, and Chainlit, you can check out the full source code and documentation here: https://lnkd.in/dWCtBBtM I would love to hear your thoughts, especially if you have been navigating similar challenges with data privacy and retrieval optimization in your own AI projects! #RAG #OpenSource #Python #GenerativeAI #MachineLearning #SoftwareEngineering #DataPrivacy #LLMOps

GitHub - dpetrovic89/Enterprise-Production-RAG-System github.com
Like Comment
To view or add a comment, sign in
Jikesh Mishra
1mo
Report this post
Nothing teaches backend engineering like production failures. You can read all the system design books you want… But the real learning happens at 2 AM When your system is down. I still remember one incident: Everything was working fine in staging. But in production? • APIs started timing out • CPU usage spiked • Logs were flooded The issue? A small change. A missing index in the database. One query slowed down → Which slowed down everything → Which crashed the system. Fix took 10 minutes. Debugging took 2 hours. Lesson: In backend systems, small mistakes don’t stay small. They amplify. That’s why: • Monitoring is not optional • Logs are your best friend • Database design matters more than code Because in production: 👉 You don’t rise to your knowledge 👉 You fall to your system design What’s your worst production incident? #BackendEngineering #SystemDesign #Production #SoftwareEngineering #Python
1 Comment
Like Comment
To view or add a comment, sign in

17,536 followers

View Profile Follow

Boosting LLM Performance with Rust: LiteLLM 3x Faster

More from this author

Will Everyone Eat Software For Health?

Is India Ready For Chatbot Revolution?

2 Years, 2 Pivots. How We Launched An Open Source Data Framework: Part 2

Explore content categories

Boosting LLM Performance with Rust: LiteLLM 3x Faster

More Relevant Posts

More from this author

Will Everyone Eat Software For Health?

Is India Ready For Chatbot Revolution?

2 Years, 2 Pivots. How We Launched An Open Source Data Framework: Part 2

Explore related topics

Explore content categories