Supply Chain Attack Hits Python Package LiteLLM

203,204 followers

A supply chain attack hit a Python package with ~3 million daily downloads. Malicious code executed automatically on every Python process startup for roughly 40 minutes, enough time to harvest credentials and install a persistent backdoor. That package was LiteLLM, one of the most widely used AI gateway libraries in production environments. And the attack didn't even come through LiteLLM's own code; it came through a compromised GitHub Action in their CI/CD pipeline. The deeper lesson here isn't specific to LiteLLM. It's about how engineering teams think (or don't think) about AI gateways as infrastructure. A proxy that sees your LLM API keys, your prompts, and sits in the request path between your applications and your model providers isn't a dev tool. It's critical infrastructure. We wrote a breakdown of what happened, what the migration path looks like, and what questions to ask of any AI gateway you're evaluating. Link in comments.

1 Comment

WSO2 3w

https://wso2.com/library/blogs/litellm-alternatives/

To view or add a comment, sign in

More Relevant Posts

Ishu Dev
2w
Report this post
In the spirit of World Quantum Day, I wanted to move beyond theory and actually explore how post-quantum cryptographic algorithms behave in practice. So I built a benchmarking system for PQC KEMs using Python and the Open Quantum Safe (liboqs) library, running everything in a reproducible Docker environment. I analyzed algorithms like: -ML-KEM (Kyber) -NTRU across key metrics: -Key generation -Encapsulation & decapsulation -Key and ciphertext sizes It was fascinating to see how the trade-offs between security, performance, and size show up clearly when you measure these systems. I also used OpenAI's Codex as an AI assistant throughout the process — helping with debugging, structuring the pipeline, and speeding up development. 🔗 GitHub: https://lnkd.in/gk7HuwAa #WorldQuantumDay #PostQuantumCryptography #Cryptography #PQC #Python #Docker #AI

GitHub - IshuDevQ/PQC_Benchmark github.com
Like Comment
To view or add a comment, sign in
Rounak Bende
1w
Report this post
Eliminate tool schema bloat! Give an AI agent 30+ MCP tools and thousands of tokens of JSON schemas eat the context window every turn. codemode-lite takes a different approach. Instead of flooding the agent with tool schemas, it exposes one tool: run_python. The agent writes Python, calls whatever tools it needs from inside a secure sandbox, and only the final result comes back. No schema bloat. No context growth. Two sandbox options: Podman containers for persistent state with enterprise isolation, or Pyodide WASM via Node.js for lightweight stateless execution. Add new MCP servers by dropping a JSON config. No code changes needed. Blog: https://lnkd.in/eTiBesX9 #AI #LLM #MCP #OpenSource #RedHat

Code execution with MCP: How sandboxed Python replaces tool schema bloat in AI agents https://next.redhat.com
Like Comment
To view or add a comment, sign in
Rayyan Khan
3w
Report this post
I just wrapped up Anthropic's course on the Model Context Protocol (MCP). If you’ve ever built integrations between AI models and external services, you know it usually means writing a lot of custom boilerplate code and manually handling JSON schemas. The most valuable takeaway from this course was seeing how MCP standardizes that entire process. Instead of building one-off connections, MCP shifts the integration burden to a consistent architecture based on three clean primitives: Tools (controlled by the model), Resources (controlled by the app), and Prompts (controlled by the user). Getting hands-on with the Python SDK to build an MCP server—and replacing manual schema writing with simple decorators—showed exactly how much this protocol reduces the friction of connecting AI to real-world data and APIs. A really practical look at how AI infrastructure is maturing and becoming easier to scale. #ArtificialIntelligence #ModelContextProtocol #Anthropic #Python #SoftwareEngineering

Certificate of Completion linkedin.com
Like Comment
To view or add a comment, sign in
Sarthak Singh
3w Edited
Report this post
🚨 Token limits aren’t the real problem — context selection is. While working on LLM pipelines, I kept running into the same trade-off: • Truncate old messages → lose useful context • Send everything → waste tokens and increase cost Neither felt right. So I started experimenting with a different approach: 👉 Treat memory as compression + retrieval What worked surprisingly well: • Older messages → compressed into a short rolling summary (TextRank) • Recent messages → filtered using TF-IDF to keep only what’s relevant • Final prompt → summary + relevant context (not full history) Result: ✔ stays within token limits ✔ preserves important context ✔ reduces unnecessary token usage And the interesting part — this works without heavy infra or embeddings. So instead of asking: “how do I fit everything into the context window?” A better question is: 👉 what actually deserves to be in the context? I packaged this into a small Python library while experimenting. If you're building with LLMs, curious how you're handling memory — truncation, embeddings, or something else? #LLM #AIEngineering #Python #MLOps #RAG #LLMOps
4 Comments
Like Comment
To view or add a comment, sign in
Sharuzzaman Ahmat Raslan
1w
Report this post
I'm currently evaluating LiteLLM vs Bifrost. Both are LLM Gateway. Bifrost claims to be the fastest, but feature-wise, LiteLLM have more features. LiteLLM's reputation was recently dented when a hacker published 2 malicious versions on PyPI. It also used higher memory compared to Bifrost, because LiteLLM is developed using Python, and Bifrost is developed using Golang. Bifrost Open Source version looks like an inferior version compared to the Enterprise version. With LiteLLM, I should be able to create a "virtual provider" and "virtual model " like @myownprovider/myownmodel, which will transparently round robin across different providers and models. The documentation shows it, but it's unclear whether I can create it via the web UI only. Will test on this. Let me know in the comments if you are exploring other LLM Gateways as well. Open to discussion if you are exploring other parts of AI Platform Engineering.
1 Comment
Like Comment
To view or add a comment, sign in
NextTech

473 followers
3w
Report this post
How to Build a Secure Local-First Agent Runtime with OpenClaw Gateway, Skills, and Controlled Tool Execution In this tutorial, we build and operate a fully local, schema-valid OpenClaw runtime. We configure the OpenClaw gateway with strict loopback binding, set up authenticated model access through environment variables, and define a secure execution environment using the built-in exec tool. We then create a structured custom skill that the OpenClaw agent can discover and invoke deterministically. Instead of manually running Python scripts, we allow OpenClaw to orchestrate model reasoning, skill selection, and controlled tool execution through its agent runtime....

How to Build a Secure Local-First Agent Runtime with OpenClaw Gateway, Skills, and Controlled Tool Execution https://nexttech-news.com
Like Comment
To view or add a comment, sign in
Sadaqat Ali
2w
Report this post
Most LLM agents struggle with limited context windows and can’t handle large documents effectively. I built an agentic RAG assistant for large PDF Q&A that overcomes this by retrieving only the most relevant context from large PDFs before generating answers. ⚙️ Tech: Python, LangChain, OpenAI Embeddings, Qdrant 🔹 Features: Handles large PDFs via chunking + vector search Semantic retrieval for precise context Hallucination-resistant responses 🔗 GitHub: https://lnkd.in/gZd3wHgP #AI #RAG #LangChain #OpenAI

GitHub - DevSadaqat/pdf-knowledge-agent: An agentic Retrieval-Augmented Generation (RAG) assistant that answers questions from a large PDF by extracting document text, splitting it into semantic chunks, storing embeddings in a vector database, and generating grounded responses from retrieved context. github.com

4 Comments
Like Comment
To view or add a comment, sign in
Sayyed Shozib Abbas
1w
Report this post
Handling complexity in long running Python services often feels like juggling fragile glue code, retry loops, watchdogs, and scattered flags. Di Lu’s article, “A supervisor tree library for building predictable and resilient programs,” offers a compelling approach with Runsmith, a Python library inspired by Erlang/OTP supervisor trees that models each unit as a typed worker with an explicit lifecycle. You can read the full breakdown here: https://lnkd.in/dgxjFnpx. What stands out is the shift from brittle process level restarts to fine grained fault isolation and health monitoring that catches stalls and constraint violations, not just crashes. This aligns with challenges I’ve faced building multi component platforms where uptime matters and failure domains must be confined. One caveat is that adopting such a framework requires upfront discipline in designing worker lifecycles and state machines, which can add complexity early on. However, this investment pays dividends when shipping real products that demand maintainability and predictable fault recovery. How have others balanced this upfront design effort against the operational resilience gains in production? #python #softwarearchitecture #systemdesign #reliabilityengineering #productdevelopment #founders #engineering #faulttolerance #opensource #devtools #resilience #longrunningservices

A supervisor-tree library for building predictable and resilient programs dev.to
Like Comment
To view or add a comment, sign in
Jan Giacomelli
1mo
Report this post
With low-quality tests, you're paying tokens for fixing them with little to no benefit that tests should provide Let's be real. Most of the Python tests out there are a waste of time. They are there to make the manager happy, to pass the compliance review, or to exercise dominance. Talking about tests that: - break due to unrelated changes, - make you restart the CI/CD pipeline and hope that they pass on the next run, - take forever to run, - pass, but the production is broken. Back in the day, one complained about having to work with such tests. Nowadays, we're paying LLM tokens while Claude Code is fixing them over and over. Pure waste of time and money. In my latest article, I'm describing 7 qualities of highly valuable tests that every developer should know. Qualities of tests that help you ship faster with AI without losing confidence or turning your status page into a traffic light🚦 Don't forget to subscribe to not miss the next tip 🔔

Why most Python tests are a waste of time? python-testing.com

3 Comments
Like Comment
To view or add a comment, sign in

203,204 followers

View Profile Connect

Supply Chain Attack Hits Python Package LiteLLM

More from this author

WSO2 Newsletter - March 2026 Edition

WSO2 Newsletter - December 2025 Edition

WSO2 Newsletter - October 2025 Edition

Explore content categories

Supply Chain Attack Hits Python Package LiteLLM

More Relevant Posts

More from this author

WSO2 Newsletter - March 2026 Edition

WSO2 Newsletter - December 2025 Edition

WSO2 Newsletter - October 2025 Edition

Explore related topics

Explore content categories