A supply chain attack hit a Python package with ~3 million daily downloads. Malicious code executed automatically on every Python process startup for roughly 40 minutes, enough time to harvest credentials and install a persistent backdoor. That package was LiteLLM, one of the most widely used AI gateway libraries in production environments. And the attack didn't even come through LiteLLM's own code; it came through a compromised GitHub Action in their CI/CD pipeline. The deeper lesson here isn't specific to LiteLLM. It's about how engineering teams think (or don't think) about AI gateways as infrastructure. A proxy that sees your LLM API keys, your prompts, and sits in the request path between your applications and your model providers isn't a dev tool. It's critical infrastructure. We wrote a breakdown of what happened, what the migration path looks like, and what questions to ask of any AI gateway you're evaluating. Link in comments.
Supply Chain Attack Hits Python Package LiteLLM
More Relevant Posts
-
In the spirit of World Quantum Day, I wanted to move beyond theory and actually explore how post-quantum cryptographic algorithms behave in practice. So I built a benchmarking system for PQC KEMs using Python and the Open Quantum Safe (liboqs) library, running everything in a reproducible Docker environment. I analyzed algorithms like: -ML-KEM (Kyber) -NTRU across key metrics: -Key generation -Encapsulation & decapsulation -Key and ciphertext sizes It was fascinating to see how the trade-offs between security, performance, and size show up clearly when you measure these systems. I also used OpenAI's Codex as an AI assistant throughout the process — helping with debugging, structuring the pipeline, and speeding up development. 🔗 GitHub: https://lnkd.in/gk7HuwAa #WorldQuantumDay #PostQuantumCryptography #Cryptography #PQC #Python #Docker #AI
To view or add a comment, sign in
-
Eliminate tool schema bloat! Give an AI agent 30+ MCP tools and thousands of tokens of JSON schemas eat the context window every turn. codemode-lite takes a different approach. Instead of flooding the agent with tool schemas, it exposes one tool: run_python. The agent writes Python, calls whatever tools it needs from inside a secure sandbox, and only the final result comes back. No schema bloat. No context growth. Two sandbox options: Podman containers for persistent state with enterprise isolation, or Pyodide WASM via Node.js for lightweight stateless execution. Add new MCP servers by dropping a JSON config. No code changes needed. Blog: https://lnkd.in/eTiBesX9 #AI #LLM #MCP #OpenSource #RedHat
To view or add a comment, sign in
-
I just wrapped up Anthropic's course on the Model Context Protocol (MCP). If you’ve ever built integrations between AI models and external services, you know it usually means writing a lot of custom boilerplate code and manually handling JSON schemas. The most valuable takeaway from this course was seeing how MCP standardizes that entire process. Instead of building one-off connections, MCP shifts the integration burden to a consistent architecture based on three clean primitives: Tools (controlled by the model), Resources (controlled by the app), and Prompts (controlled by the user). Getting hands-on with the Python SDK to build an MCP server—and replacing manual schema writing with simple decorators—showed exactly how much this protocol reduces the friction of connecting AI to real-world data and APIs. A really practical look at how AI infrastructure is maturing and becoming easier to scale. #ArtificialIntelligence #ModelContextProtocol #Anthropic #Python #SoftwareEngineering
To view or add a comment, sign in
-
🚨 Token limits aren’t the real problem — context selection is. While working on LLM pipelines, I kept running into the same trade-off: • Truncate old messages → lose useful context • Send everything → waste tokens and increase cost Neither felt right. So I started experimenting with a different approach: 👉 Treat memory as compression + retrieval What worked surprisingly well: • Older messages → compressed into a short rolling summary (TextRank) • Recent messages → filtered using TF-IDF to keep only what’s relevant • Final prompt → summary + relevant context (not full history) Result: ✔ stays within token limits ✔ preserves important context ✔ reduces unnecessary token usage And the interesting part — this works without heavy infra or embeddings. So instead of asking: “how do I fit everything into the context window?” A better question is: 👉 what actually deserves to be in the context? I packaged this into a small Python library while experimenting. If you're building with LLMs, curious how you're handling memory — truncation, embeddings, or something else? #LLM #AIEngineering #Python #MLOps #RAG #LLMOps
To view or add a comment, sign in
-
-
I'm currently evaluating LiteLLM vs Bifrost. Both are LLM Gateway. Bifrost claims to be the fastest, but feature-wise, LiteLLM have more features. LiteLLM's reputation was recently dented when a hacker published 2 malicious versions on PyPI. It also used higher memory compared to Bifrost, because LiteLLM is developed using Python, and Bifrost is developed using Golang. Bifrost Open Source version looks like an inferior version compared to the Enterprise version. With LiteLLM, I should be able to create a "virtual provider" and "virtual model " like @myownprovider/myownmodel, which will transparently round robin across different providers and models. The documentation shows it, but it's unclear whether I can create it via the web UI only. Will test on this. Let me know in the comments if you are exploring other LLM Gateways as well. Open to discussion if you are exploring other parts of AI Platform Engineering.
To view or add a comment, sign in
-
-
How to Build a Secure Local-First Agent Runtime with OpenClaw Gateway, Skills, and Controlled Tool Execution In this tutorial, we build and operate a fully local, schema-valid OpenClaw runtime. We configure the OpenClaw gateway with strict loopback binding, set up authenticated model access through environment variables, and define a secure execution environment using the built-in exec tool. We then create a structured custom skill that the OpenClaw agent can discover and invoke deterministically. Instead of manually running Python scripts, we allow OpenClaw to orchestrate model reasoning, skill selection, and controlled tool execution through its agent runtime....
To view or add a comment, sign in
-
Most LLM agents struggle with limited context windows and can’t handle large documents effectively. I built an agentic RAG assistant for large PDF Q&A that overcomes this by retrieving only the most relevant context from large PDFs before generating answers. ⚙️ Tech: Python, LangChain, OpenAI Embeddings, Qdrant 🔹 Features: Handles large PDFs via chunking + vector search Semantic retrieval for precise context Hallucination-resistant responses 🔗 GitHub: https://lnkd.in/gZd3wHgP #AI #RAG #LangChain #OpenAI
To view or add a comment, sign in
-
Handling complexity in long running Python services often feels like juggling fragile glue code, retry loops, watchdogs, and scattered flags. Di Lu’s article, “A supervisor tree library for building predictable and resilient programs,” offers a compelling approach with Runsmith, a Python library inspired by Erlang/OTP supervisor trees that models each unit as a typed worker with an explicit lifecycle. You can read the full breakdown here: https://lnkd.in/dgxjFnpx. What stands out is the shift from brittle process level restarts to fine grained fault isolation and health monitoring that catches stalls and constraint violations, not just crashes. This aligns with challenges I’ve faced building multi component platforms where uptime matters and failure domains must be confined. One caveat is that adopting such a framework requires upfront discipline in designing worker lifecycles and state machines, which can add complexity early on. However, this investment pays dividends when shipping real products that demand maintainability and predictable fault recovery. How have others balanced this upfront design effort against the operational resilience gains in production? #python #softwarearchitecture #systemdesign #reliabilityengineering #productdevelopment #founders #engineering #faulttolerance #opensource #devtools #resilience #longrunningservices
To view or add a comment, sign in
-
With low-quality tests, you're paying tokens for fixing them with little to no benefit that tests should provide Let's be real. Most of the Python tests out there are a waste of time. They are there to make the manager happy, to pass the compliance review, or to exercise dominance. Talking about tests that: - break due to unrelated changes, - make you restart the CI/CD pipeline and hope that they pass on the next run, - take forever to run, - pass, but the production is broken. Back in the day, one complained about having to work with such tests. Nowadays, we're paying LLM tokens while Claude Code is fixing them over and over. Pure waste of time and money. In my latest article, I'm describing 7 qualities of highly valuable tests that every developer should know. Qualities of tests that help you ship faster with AI without losing confidence or turning your status page into a traffic light🚦 Don't forget to subscribe to not miss the next tip 🔔
To view or add a comment, sign in
More from this author
Explore related topics
- Understanding Proxy-Guided Attacks on LLMs
- Python Scripts for Detecting LLM Security Threats
- Lessons From Real World AI Deployments
- How to Protect Llms From Cyber Attacks
- Identifying Threats to LLM Applications
- How to Secure Large Language Models
- Preventing Terminal Exploits in LLM Applications
- AI Coding Tools and Their Impact on Developers
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development
https://wso2.com/library/blogs/litellm-alternatives/