I made LiteLLM 3x faster. With one line of code. I was running LiteLLM (YC W23) in production, simulating and handling thousands of requests per second. The Python overhead was killing us. Connection pooling was the bottleneck. Rate limiting was eating CPU cycles. So we did something unconventional: We rewrote the hot paths in Rust. The results? - 3.2x faster connection pooling - 1.6x faster rate limiting - 42x more memory efficient for high-cardinality workloads - Zero code changes required Here's how you use it: import fast_litellm # That's it. One line. import litellm # Everything just works, but faster No configuration. No migration. No breaking changes. Just add fast-litellm to your requirements.txt and you're done. The secret? PyO3 + DashMap for lock-free concurrency. We kept the Python API you love but replaced the internals with Rust where it matters. What we learned: 1. Not everything needs to be rewritten in Rust 2. FFI overhead is real - small operations don't benefit 3. The biggest wins are in concurrent data structures 4. Production safety matters - we built in automatic fallback I am open-sourcing everything. MIT licensed. Works on Linux, macOS, Windows. Python 3.8-3.13. Link in comments. --- Building something that needs LLM performance at scale? Let's connect. #OpenSource #Rust #Python #LLM #Performance #AI #MachineLearning #SoftwareEngineering
Trying this today on one of my projects 🙏🙏
I have wondered why haven't someone already completely rewritten LiteLLM in Rust. I mean for simple queries it starts to consume 10-20% of CPU and it is just routing requests. You could probably achieve same things with NGINX and just right configuration
Really impressive results. Optimizing connection pooling and rate limiting at this level can make a huge difference at scale.
Looks interesting. Will try to understand if python overhead can be better mitigated by design improvements.
most LLM infra bottlenecks end up being networking and concurrency, not the model itself
so amazing Dipankar Sarkar
> one line of code > rewrote the hot paths in rust Well, you got me! 😄
The "42x memory efficient" number is wild. I think the underrated insight here is your point #1 - not everything needs Rust. Most teams reach for a full rewrite when targeted FFI would've done it. Would love to see a breakdown of where the FFI overhead actually started hurting you.
Here is the repo! https://github.com/neul-labs/fast-litellm