Nripesh Srivastava’s Post

No one asked for a shared package. I built one anyway. Multiple teams at a global pharmaceutical company were running the same logic. Fetch data from source. Transform it. Write to ADLS Gen2. Each team had their own version. Assumption: custom code per team is safer. Easier to change without breaking someone else’s pipeline. Reality: five codebases with five variations of the same bug. Every upstream schema change meant five separate fixes. I built an OOP-based Python package. Parameterized. Modular. One abstraction for retrieval, one for transformation, one for storage. Other teams started using it. Then more teams. It became the default pattern not because someone mandated it, but because it was simply better. Reusability isn’t about efficiency. It’s about reducing drift between what you intended and what ten teams independently decided to implement. The hardest part wasn’t the code. It was designing the interface so teams could configure it without needing to understand what was underneath. That’s the real engineering skill. Not writing a good function. Writing one that other engineers trust enough not to rewrite. What’s a pattern you built that spread further than you expected? #DataEngineering #Python #AzureDatabricks

To view or add a comment, sign in

More Relevant Posts

Arjan Giri
3w
Report this post
I searched my Claude Code session history and found something I didn't expect. Here's what I found: https://lnkd.in/dFcHWNGS 185 sessions in 13 days. 1,501 commands. 94+ commits shipped. Only 3 references to an IDE — and two were about window management. I'm an EM who codes daily. Java, Python, Kubernetes, Helm, infrastructure-as-code across multiple repos. The kind of stack that should demand a heavyweight IDE. Instead, I've been shipping from a terminal. And the work has never moved faster. I wrote about why the IDE's core value proposition — "we'll help you find and edit text in files" — feels increasingly like a card catalog in the age of Google. The shift from file navigation to intent-driven development already happened. Most of us just haven't checked our own session history yet. #SoftwareEngineering #AI #DeveloperTools #EngineeringManagement
Like Comment
To view or add a comment, sign in
Sayyed Shozib Abbas
1w
Report this post
Handling complexity in long running Python services often feels like juggling fragile glue code, retry loops, watchdogs, and scattered flags. Di Lu’s article, “A supervisor tree library for building predictable and resilient programs,” offers a compelling approach with Runsmith, a Python library inspired by Erlang/OTP supervisor trees that models each unit as a typed worker with an explicit lifecycle. You can read the full breakdown here: https://lnkd.in/dgxjFnpx. What stands out is the shift from brittle process level restarts to fine grained fault isolation and health monitoring that catches stalls and constraint violations, not just crashes. This aligns with challenges I’ve faced building multi component platforms where uptime matters and failure domains must be confined. One caveat is that adopting such a framework requires upfront discipline in designing worker lifecycles and state machines, which can add complexity early on. However, this investment pays dividends when shipping real products that demand maintainability and predictable fault recovery. How have others balanced this upfront design effort against the operational resilience gains in production? #python #softwarearchitecture #systemdesign #reliabilityengineering #productdevelopment #founders #engineering #faulttolerance #opensource #devtools #resilience #longrunningservices

A supervisor-tree library for building predictable and resilient programs dev.to
Like Comment
To view or add a comment, sign in
Karedla Jnaneswar
2w
Report this post
I just learned something that no LeetCode problem ever taught me. How do you sort 200 GB of data when your RAM is only 5 GB? 🤯 I came across this in a real interview question today — and honestly, I had no clue. The answer? External Merge Sort. Here's how it works in simple terms 👇 📦 Phase 1 — Break it down: • Read 5 GB of data into RAM • Sort it using QuickSort • Write it back to disk as a sorted "chunk" • Repeat 40 times → now you have 40 sorted files 🔀 Phase 2 — Merge using a Min-Heap: • Open all 40 files at once • Push the first element of each file into a Min-Heap (size = just 40!) • Pop the minimum → write to output → push next element from that file • Repeat until all 200 GB are merged The genius part? The heap never holds more than 40 elements at a time. Not 200 GB. Just 40. All those Heap and Merge Sort problems on LeetCode? This is exactly what they're preparing you for — just at a massive scale. This is why Big Tech companies ask System Design questions. Real-world data doesn't fit in an array. 🌍 📸 Attached the full Python implementation above — Phase 1 (Run Creation) + Phase 2 (K-Way Merge) with comments explaining every step. Drop a 🙋 if you had no idea this concept existed before today! And tell me — what's the most surprising DSA concept YOU'VE come across recently? 👇 #DSA #LeetCode #SystemDesign #SoftwareEngineering #Python #CodingInterview #ExternalSorting
Like Comment
To view or add a comment, sign in
Clausey

429 followers
2w
Report this post
One instruction file doesn't scale. The moment your codebase has a Python service and a TypeScript frontend and a Go worker, a single CLAUDE.md becomes either too generic to be useful or too bloated to trust. Scoped context solves this the way filesystems already do — by nesting. Org-level rules wrap user-level rules wrap project-level rules wrap directory-level rules. The agent reads whichever scope it's working inside, the same way a developer picks up conventions walking into a new folder. Example: the org says "never commit .env files." The project says "use Zod for validation." The ./src/api/ directory says "return JSON, validate schema." The agent sees all three, cleanly composed. The trade-off is discoverability. When rules live in four places, it's harder to answer "what does the agent actually see right now?" Good tooling here isn't optional — it's the whole pattern. Treat context as a tree, not a file. How are you organizing rules across a multi-language codebase? #AI #AgenticAI #SoftwareArchitecture #DeveloperTools #Clausey
Like Comment
To view or add a comment, sign in
Praneeta Tikare
3w
Report this post
From a simple log parser to simulating real SRE scenarios I extended my Log Analyzer project to make it more aligned with real-world production systems and incident handling. 🔧 What’s new: • Regex-based log parsing to extract timestamp, log level, and message • Top N error analysis using Python’s Counter • Error spike detection based on a time window (simulating incident conditions) 📊 Example insight: The tool can now detect abnormal error spikes within a short duration — something SREs rely on during production incidents. 💡 What I learned: Log analysis isn’t just about counting errors — it’s about identifying patterns, trends, and anomalies over time. 🔗 Project: https://lnkd.in/dEZyK7qH Next step: exploring real-time log monitoring and alerting integrations. Would love your feedback! #SRE #DevOps #Python #Observability #SiteReliabilityEngineering #LearningInPublic #GitHub
1 Comment
Like Comment
To view or add a comment, sign in
Summana Subramanian
4w
Report this post
Sub-100ms APIs Serving 10K+ Requests/Day-Here's What That Actually Takes Spinning up a FastAPI endpoint takes 10 minutes. Making it production-ready takes a lot more. At my current role, I build and maintain REST APIs in Python (FastAPI) and Node.js that serve over 10,000 requests per day — with sub-100ms latency requirements. Here's what "production-ready" actually meant for us: Schema design before code. Every endpoint started with a PostgreSQL schema review. Badly normalized data shows up in latency later. Multithreading is not optional at scale. Single-threaded Python collapses under concurrent load. I built multithreaded data-processing pipelines that improved throughput by 30% under real-world concurrency. Observability from day one. Latency SLAs mean nothing if you can't measure them. Instrumentation and logging were part of the PR, not an afterthought. OOP principles keep it maintainable. Services that grow fast get messy fast. Clean object-oriented design was the only thing that kept the codebase sane as features stacked up. 10K requests/day is not massive by internet scale — but it taught me what production really means. What's the hardest production lesson you've learned? #BackendEngineering #FastAPI #PythonDevelopment #SoftwareEngineering #APIDesign
Like Comment
To view or add a comment, sign in
Thejasekhar Reddy Gundlooru
2w
Report this post
I spent too much time reconciling logs and traces until I understood how OpenTelemetry logging actually works. 🔑 The key insight: OTel doesn't try to be your logging library. It's a bridge. Your existing logger (Log4j, Python logging, winston) keeps working exactly as it does today. But behind the scenes, an appender automatically enriches every log record with trace context — the TraceId and SpanId from the active span. ✨ That's it. That's the whole idea. And it changes everything. ⚡ Suddenly, debugging is faster. You see logs in context of their span. You see which logs caused a trace anomaly. Your backend (Jaeger, Tempo, Elastic, whatever) can now correlate logs to traces without you writing SQL joins or doing manual detective work. 📖 Just published a 16-minute technical guide walking through log formats, the unified LogRecord schema, the Logs API and SDK, processors, and exporters. Available on LearnObservability — link in comments. #OpenTelemetry #Observability #DevOps #DistributedTracing #SRE #Logging

1 Comment
Like Comment
To view or add a comment, sign in
Andrei Voicu Tomut
3w
Report this post
When code runs millions of times a day, even minor enhancements lead to significant compute savings. So I built xmltodict-fast. 🦀🐍 xmltodict is a Python library many of us use without a second thought. With ~5K GitHub stars, it’s a quiet workhorse powering ETL pipelines, SOAP clients, and invoice processors. It’s a drop-in replacement that maintains the same public API, but rewrites the performance-critical sections in Rust using PyO3 and quick-xml. Importantly: if the Rust extension isn't available on a platform, it seamlessly reverts to the original Python implementation. It's completely safe for incremental adoption. local benchmarks : 🚀 parse(): 2.1 × faster on typical XML 🚀 unparse():5.9 × faster (massive for serialization-heavy workflows) On pathologically deep XML (500+ nesting levels), the Rust version is actually slower. :( (Side note: Thanks to my kind and patient AI coding assistant for helping me building this!) If you work with XML in Python, I welcome your feedback, testing, and pull requests! 🔗 Repo & Benchmarks: https://lnkd.in/exhfBuD7 #Python #RustLang #PyO3 #OpenSource #DataEngineering #PerformanceOptimization
2 Comments
Like Comment
To view or add a comment, sign in
Gopal Goswami
2w
Report this post
🚀 Efficient Duplicate Detection with Hash Sets | LeetCode Today, I tackled the Contains Duplicate problem. While the brute force approach is often the first instinct, optimizing for time complexity is where the real fun begins! 💡 The Problem: Given an integer array nums, return true if any value appears at least twice in the array, and return false if every element is distinct. ⚡ My Approach: I utilized a Hash Set to track elements as I traversed the array. This allows for near-instantaneous lookups compared to nested loops. 👉 The Logic: Initialize an empty set seen. Iterate through the array once. For each number, check: "Have I seen this before?" (Is it in the set?) If Yes → Return True immediately. If No → Add the number to the set and keep moving. 🔥 Complexity Analysis: ⏱ Time Complexity: $O(n)$ – We only pass through the list once. 📦 Space Complexity: $O(n)$ – In the worst case (all unique elements), we store all $n$ elements in the set. 🏆 The Result: ✔️ Accepted: All 77 test cases passed. ✔️ Performance: 9 ms runtime, beating 73.44% of Python3 submissions! 📌 Key Takeaway: Using a Set turns a potential $O(n^2)$ search into a sleek $O(n)$ operation. Choosing the right data structure isn't just about passing tests; it's about writing scalable, "production-ready" code. 💻 Tech Stack: #Python | #DataStructures | #Algorithms #leetcode #dsa #coding #programming #softwareengineering #100DaysOfCode #pythonprogramming #tech #growthmindset
Like Comment
To view or add a comment, sign in
Yubisono P.

Experienced Credit Specialist with a demonstrated history of working in the Financial Services Industry. Data Scientist and Machine Learnings using Python, SQL, PostgreSQL, Tableau, Pentaho, Chat GPT, Gemini 2.5 Flash
5d
Report this post
Model Serialization Deployment using modelkit #machinelearning #datascience #modelserializationdeployment #modelkit modelkit is a minimalist yet powerful MLOps library for Python, built for people who want to deploy ML models to production. It packs several features which make your go-to-production journey a breeze, and ensures that the same exact code will run in production, on your machine, or on data processing pipelines. Features Wrapping your prediction code in modelkit instantly gives acces to all features : fast Model predictions can be batched for speed (you define the batching logic) with minimal overhead. composable Models can depend on other models, and evaluate them however you need to extensible Models can rely on arbitrary supporting configurations files called assets hosted on local or cloud object stores type-safe Models' inputs and outputs can be validated by pydantic, you get type annotations for your predictions and can catch errors with static type analysis tools during development. async Models support async and sync prediction functions. modelkit supports calling async code from sync code so you don't have to suffer from partially async code. testable Models carry their own unit test cases, and unit testing fixtures are available for pytest fast to deploy Models can be served in a single CLI call using fastapi In addition, you will find that modelkit is : simple Use pip to install modelkit, it is just a Python library. robust Follow software development best practices : version and test all your configurations and artifacts. customizable Go beyond off-the-shelf models: custom processing, heuristics, business logic, different frameworks, etc. framework agnostic Bring your own framework to the table, and use whatever code or library you want. modelkit is not opinionated about how you build or train your models. organized Version and share you ML library and artifacts with others, as a Python package or as a service. fast to code Just write the prediction logic and that's it. No cumbersome pre or postprocessing logic, branching options, etc... The boilerplate code is minimal and sensible. https://lnkd.in/genAAUCg

GitHub - Cornerstone-OnDemand/modelkit: Toolkit for developing and maintaining ML models github.com
Like Comment
To view or add a comment, sign in

221 followers

10 Posts

View Profile Follow

Nripesh Srivastava’s Post

More Relevant Posts

Explore content categories