Lessons from Replicate's logs: Observability and security

I was paying twenty cents a run for a hosted image pipeline on Replicate. At a few thousand runs a month, that started to hurt. No README. No docs. Just four input parameters and a price tag. I wanted to call the underlying models directly, but I had a hazy idea of what was chained together or in what order. Then I noticed Replicate's `predictions.create()` API returns a `logs` field. Raw stdout from the container. One call. The entire pipeline printed itself out with emojis. Step 1: LLM generates a contextual prompt ... Step 2: Segmentation extracts a face mask ... Step 3: Mask inversion (a detail that had been silently breaking my outputs) ... Step 4: Inpainting model does the swap ... Few lines of Python later, same output, roughly half the cost. Nothing clever. I just read what was already there. What stuck with me is how familiar the pattern felt. Recently someone reconstructed the full source of Claude Code from the shipped npm bundle. No breach. Just a minified file and an LLM to rename the variables. Observability, side channels, shipped bundles, container logs. Different layers, same lesson. A small reminder for builders: your debug output is part of your public interface. And for anyone integrating a closed system: check what it's already saying out loud before assuming it's opaque. What's the most useful thing you've learned from logs someone forgot to turn off? Details in the post in comments. #SoftwareEngineering #Security #MachineLearning #DeveloperTools

1 Comment

Shorya Kumar 3w

https://open.substack.com/pub/shoryak/p/closed-systems-leak-a-simple-reminder?r=2hlbzt&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true

To view or add a comment, sign in

More Relevant Posts

Aravind Ponnala
6d
Report this post
Tired of scattering print() statements across your FastAPI code just to chase down one bug? You restart the server. Hit the endpoint. Squint at logs. Still no idea what broke. The real issue? Most tutorials show debugging on a single main.py. The moment you have a subdirectory structure, a .venv, and a .env file, the same config silently breaks. Breakpoints don't fire. VS Code loads the wrong interpreter. You get "Could not import module" and have no idea why. Once I got the setup right, everything changed: ✅ Breakpoints that actually trigger on every request ✅ Live variable inspection mid-request — no prints needed ✅ Call stack navigation to see exactly how you got there ✅ Conditional breakpoints that pause only when a specific condition is true Zero changes to your source code. Please commit launch.json once; your whole team will get it. I wrote a full guide covering: 🔧 The exact launch.json that works and the one field most configs get wrong. 🐛 A 5-step mental model: if debugging fails, one of these broke. 🐳 Remote debugging inside Docker with debugpy. ⚡ Logpoints, conditional breakpoints, and exception pausing. If your breakpoints never hit, you'll recognize the fix within the first two minutes of reading. 👉 https://lnkd.in/ew4ueC8Z #FastAPI #Python #VSCode #Debugging #BackendEngineering
Like Comment
To view or add a comment, sign in
Frederick Sengstacke
2w
Report this post
67KB...That's how much config my agent had to load just to read its own rules. Blew past the tool output limit. Got dumped to a temp file. The agent had to shell out to Python to parse the JSON to extract what it needed. Three tool calls just to read the config. I knew the payload was large. I didn't know what "large" means when your working memory is a context window. For a human, 67KB is a slow page load. For an agent, it's 15,000 tokens. Ten percent of everything you can think about, gone. So I asked the agent what they'd change. "Don't send me 86 rules and hope I need most of them. Let me tell you what I'm reviewing and send me the ones that matter." ...I'd been thinking about compression. They were thinking about selection. The consumer changed. The API didn't. Gotta say: being on the receiving end of your own bad API design is instructive. Would recommend. Full conversation + what we changed ⤵️
Like Comment
To view or add a comment, sign in
Hamza Khan
1w Edited
Report this post
I built a RAG layer for Claude Code that cuts token usage by 80–90% Most devs using Claude Code don't realize they're burning tokens on files Claude doesn't need to read. Ask Claude "how does auth work?" and it reads 3 full files — 1,500+ tokens just to answer with 40 relevant lines. I fixed that. What I built: A local hybrid RAG system that sits between Claude and your codebase: → Late chunking — splits every file into overlapping 40-line windows → Dense retrieval — semantic search with all-MiniLM-L6-v2 (runs fully local, no API key) → BM25 sparse retrieval — keyword matching for exact symbol names → Cross-encoder reranking — picks the 3 best chunks from 20 candidates → File watcher — auto-rebuilds the index within 2 seconds of any file save Claude Code reads the CLAUDE.md and knows: run pip package before opening any file. It gets back 3 precise snippets with file path + line range. It reads only those lines. Nothing else. Real numbers on my Volta Engine project (76 files): - Without RAG: 17,235 chars across 3 files for one question - With RAG: 3,073 chars the exact 3 chunks that matter - 82% fewer tokens. Same answer. The whole thing runs offline. No cloud embeddings. No API calls. Just a one time pip install and run it. Stack: sentence-transformers · rank-bm25 · watchdog · Python If you use Claude Code daily on a real codebase, this pays for itself in the first session. DM me if you want the scripts. 🧠 #AI #ClaudeCode #RAG #DeveloperTools #Python #LLM #Productivity
Like Comment
To view or add a comment, sign in
Faddei Gorbunov
2w
Report this post
Defeating Anti-Bot Systems: A Video Upload Automation Case Study 🚀 I recently completed an automation project that tackled a major pain point: manual content distribution across platforms with strict security measures. The Challenge: The target websites used sophisticated anti-bot detection, causing constant upload failures, IP blocks, and manual CAPTCHA hurdles. The Solution: • Built a custom Python automation engine. • Utilized undetected-chromedriver to emulate human-like browser behavior and bypass fingerprinting. • Implemented a “smart-retry” logic and error handling to manage site-specific bans and timeouts. The Result: 100% automation. What used to take hours of manual routine is now handled by a script that “thinks” like a human user but works with the speed of a machine. Looking for automation for your business workflows? Let’s connect! #Python #Automation #WebScraping #SoftwareEngineering #BrowserAutomation
Like Comment
To view or add a comment, sign in
Gugan T
2w Edited
Report this post
🚀 I just deployed my own CLI toolbox to PyPI — globally available — because I was DONE doing the same boring tasks every day. You know that feeling when you keep saying “I’ll automate this later”… and then you do it again manually? Yeah. That broke me. So instead of writing another random script, I built My Instant Toolbox — one CLI to rule all my everyday automations. Now, messy folders? One command. Need backups right now? One command. Curious if your system is dying mid‑work? One command. Publishing to PyPI? Still… one command. What this thing actually does 👇 🧹 Cleans chaos – Auto‑organizes folders by file type 🏷️ Renames at scale – Hundreds of files, renamed in seconds 🔒 Backs up smart – Timestamped ZIP backups, zero brain cells required 📊 Shows the truth – Live CPU, RAM, Disk stats in a beautiful terminal dashboard 📦 Ships fast – Build + publish Python packages like a cheat code Built with Python + Typer + Rich, because productivity shouldn’t look ugly. I deployed it to PyPI, so anyone in the world can install it and use it instantly. 📦 pip install my-instant-toolbox 🔗 Code & docs: https://lnkd.in/g8ur7wT6 This started as “let me save 10 minutes” It turned into “why wasn’t this always one command?” If you live in the terminal and hate repetitive work — this one’s for you 🛠️ #Python #OpenSource #CLI #Automation #DevOps #BuildInPublic #SoftwareEngineering
Like Comment
To view or add a comment, sign in
Himanshu `
2w
Report this post
𝗙𝗮𝘀𝘁𝗔𝗣𝗜 𝗶𝘀𝗻'𝘁 𝗳𝗮𝘀𝘁 𝗯𝗲𝗰𝗮𝘂𝘀𝗲 𝗼𝗳 𝗙𝗮𝘀𝘁𝗔𝗣𝗜. 𝗜𝘁'𝘀 𝗳𝗮𝘀𝘁 𝗯𝗲𝗰𝗮𝘂𝘀𝗲 𝗼𝗳 𝘄𝗵𝗮𝘁'𝘀 𝘂𝗻𝗱𝗲𝗿𝗻𝗲𝗮𝘁𝗵. Most people stop at "FastAPI is faster than Flask." Few ask 𝘸𝘩𝘺. Here's what's actually happening: 𝗙𝗹𝗮𝘀𝗸 runs on 𝗪𝗦𝗚𝗜. One request = one thread = blocked until done. Your thread waits while the DB responds. It does nothing. Just sits there. 𝗙𝗮𝘀𝘁𝗔𝗣𝗜 runs on 𝗔𝗦𝗚𝗜. One thread handles 𝘵𝘩𝘰𝘶𝘴𝘢𝘯𝘥𝘴 of connections. While one request waits for DB, the thread picks up another. No idle time. But FastAPI doesn't do this alone. The real stack: • 𝗨𝘃𝗶𝗰𝗼𝗿𝗻 — the ASGI server (built on uvloop) • 𝗦𝘁𝗮𝗿𝗹𝗲𝘁𝘁𝗲 — the async engine (handles requests, WebSockets, middleware) • 𝗙𝗮𝘀𝘁𝗔𝗣𝗜 — the developer layer (validation, docs, type hints) Think of it this way: Starlette = 𝘵𝘩𝘦 𝘦𝘯𝘨𝘪𝘯𝘦. FastAPI = 𝘵𝘩𝘦 𝘥𝘢𝘴𝘩𝘣𝘰𝘢𝘳𝘥. Uvicorn = 𝘵𝘩𝘦 𝘧𝘶𝘦𝘭. Flask was built for a 𝘀𝘆𝗻𝗰𝗵𝗿𝗼𝗻𝗼𝘂𝘀 world. FastAPI was built for an 𝗮𝘀𝘆𝗻𝗰-𝗳𝗶𝗿𝘀𝘁 world. The speed difference isn't a feature. It's a 𝗳𝗼𝘂𝗻𝗱𝗮𝘁𝗶𝗼𝗻 difference. Next time someone says "FastAPI is fast", ask them: 𝘐𝘴 𝘪𝘵 𝘍𝘢𝘴𝘵𝘈𝘗𝘐, 𝘰𝘳 𝘪𝘴 𝘪𝘵 𝘚𝘵𝘢𝘳𝘭𝘦𝘵𝘵𝘦? #FastAPI #Flask #Starlette #Python #AsyncProgramming #BackendEngineering #SystemDesign #SoftwareEngineering
Like Comment
To view or add a comment, sign in
Zain Ul Abideen
2w Edited
Report this post
𝟭. 𝗪𝗵𝗲𝗻 "𝗦𝗶𝗺𝗽𝗹𝗲" 𝗙𝗲𝗮𝘁𝘂𝗿𝗲𝘀 𝗠𝗲𝗲𝘁 𝗦𝗰𝗮𝗹𝗮𝗯𝗹𝗲 𝗔𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲 A client came to me asking for a "Read Time" badge on their enterprise blog. On the surface, it’s just division, right? Total words / 200. Done. But as we peeled back the layers, it became a fascinating system design challenge: 𝗧𝗵𝗲 𝗖𝗼𝗻𝘁𝗲𝗻𝘁 𝗣𝗿𝗼𝗯𝗹𝗲𝗺: How do you handle code snippets, technical tables, or 50+ images? (Hint: They don't read as fast as prose). 𝗧𝗵𝗲 𝗦𝗰𝗮𝗹𝗲 𝗣𝗿𝗼𝗯𝗹𝗲𝗺: Recalculating this on every page load for 100k+ users is a waste of CPU cycles. 𝗧𝗵𝗲 𝗦𝗼𝗹𝘂𝘁𝗶𝗼𝗻: We moved the logic to the Write-Path. Calculate once during the "Publish" event, store it as metadata, and serve it via CDN. 𝗧𝗵𝗲 𝗥𝗲𝘀𝘂𝗹𝘁: A snappier UI and a more accurate "time-to-value" promise for the readers. Check out the Python logic I used to handle the heavy lifting below! 👇 𝟮. 𝗣𝘆𝘁𝗵𝗼𝗻 𝗕𝗮𝗰𝗸𝗲𝗻𝗱 𝗜𝗺𝗽𝗹𝗲𝗺𝗲𝗻𝘁𝗮𝘁𝗶𝗼𝗻 This script handles the logic of "Write-Time" processing. It strips out distractions and accounts for "Image Fatigue" (where users scan images faster the more there are). #SystemDesign #Python #SoftwareEngineering #Backend
Like Comment
To view or add a comment, sign in
Ricky S
3w Edited
Report this post
Day 99: Square Root Decomposition & Prefix Multiplications ⚡ Problem 3655: XOR After Range Multiplication Queries II Yesterday’s brute force approach hit a wall today with a TLE (Time Limit Exceeded). The constraints were significantly tighter, requiring a more sophisticated optimization. The Strategy: Square Root Decomposition To handle the queries efficiently, I split the problem based on the step size k: • Large Steps (k ≥ √N): For large gaps, the number of updates is small enough that direct simulation still works within time limits. • Small Steps (k < √N): This is where the magic happens. For small k, I used a Difference Array technique modified for multiplications. • Modular Inverse & Prefix Products: Instead of updating every index, I marked the start (L) and end (R) of the range. I used modInverse to "cancel out" the multiplication after the range ended. A final prefix product pass (jumping by k) applied all updates in O(N) time. Technical Highlights: • Fermat's Little Theorem: Used modPow(x, MOD - 2) to calculate the modular inverse for division. • Complexity: Reduced the worst-case runtime from O(Q⋅N) to O((Q+N)√N). One day away from 100, but the focus remains on the problem in front of me. Consistency isn't about the destination; it's about the quality of the journey. 🚀 #LeetCode #Java #Algorithms #DataStructures #SquareRootDecomposition #DailyCode
Like Comment
To view or add a comment, sign in
Bijoy Sharma
3w
Report this post
Shallow Copy vs Deep Copy — The 2 AM Bug Trap 🛑 Most developers think they understand copying objects, until their original data mysteriously changes. That’s not a bug, that’s memory behavior biting you. → Shallow Copy Creates a new container, but nested objects are still shared (by reference) 👉 Change nested data → both copies change. Best for: Flat, simple data. → Deep Copy Creates a completely independent clone, everything is copied recursively. 👉 Change anything → original stays untouched Best for: Complex, nested structures. 💡 Rule of Thumb Shallow → when you only need a surface-level copy Deep → when you need true isolation ⚠️ The real trap: Most bugs aren’t syntax errors. They come from not understanding how data behaves in memory. If you’ve ever spent hours debugging only to realize it was a shallow copy issue. Welcome to the club 😄 #Python #Python3 #Programming #SoftwareEngineering #CleanCode #Debugging #TechTips #PythonDeveloper #BackendDevelopment
Like Comment
To view or add a comment, sign in
Smit Mewada
1w
Report this post
My observability tool was lying to me. And it took me a full day to figure out why. I was wiring Langfuse into my LangGraph pipeline to trace every node — latency, token usage, eval scores. Standard production setup. I installed the SDK. Connected it. Sent a trace. HTTP 200. Auth successful. No errors. I opened the Langfuse UI. Zero traces. I sent another. Still 200. Still nothing. For hours I assumed the bug was in my code. Wrong callback path. Wrong credentials. Wrong graph configuration. I rewrote the integration three times. Then I read the actual error more carefully. The SDK was version 4. The server was version 3. SDK v4 uses OpenTelemetry ingestion protocol. Server v3 uses the classic ingestion protocol. They're incompatible. But the server still returned 200 — it accepted the request, then silently rejected the payload format it didn't understand. "Auth OK" did not mean "traces are arriving." The fix was two lines: → Pin SDK to langfuse==2.60.0 → Pin server image to langfuse/langfuse:2 But the real lesson was bigger than the fix: Silent failures are the hardest bugs to debug — not because the problem is complex, but because there's no signal pointing you toward it. Everything looks fine. That's exactly when you should be suspicious. Three rules I now follow for any observability or infrastructure tool: 1. Always pin both the SDK and the server to the same major version explicitly 2. "No error" is not the same as "it worked" — verify data actually arrived 3. Test with a raw HTTP call before debugging your integration layer Building a production RAG system over Canadian financial regulations. Sharing every painful lesson along the way. #MLEngineering #LLMOps #Observability #BuildingInPublic #Python #LangGraph
Like Comment
To view or add a comment, sign in

5,609 followers

19 Posts

View Profile Follow

Lessons from Replicate's logs: Observability and security

More Relevant Posts

Explore content categories