Ruslan Mukhamadiarov’s Post

𝐋𝐚𝐭𝐞𝐧𝐜𝐲 𝐢𝐬 𝐮𝐩, 𝐛𝐮𝐭 𝐧𝐨𝐭𝐡𝐢𝐧𝐠 𝐢𝐬 𝐟𝐚𝐢𝐥𝐢𝐧𝐠. 𝐖𝐡𝐞𝐫𝐞 𝐝𝐨 𝐲𝐨𝐮 𝐥𝐨𝐨𝐤 𝐟𝐢𝐫𝐬𝐭? 🔍 One of the worst production situations: Latency is growing 📈 Users feel it 😐 Logs are clean 🧼 Nothing is obviously broken ❌ Most teams waste time here. They search for errors 🔎 Restart pods 🔄 Jump between dashboards 📊 But when nothing is failing, the problem is rarely an exception. It is usually one of these: 1. 𝗦𝗰𝗼𝗽𝗲 𝗳𝗶𝗿𝘀𝘁 🎯 One endpoint or all? One instance or all? Reads, writes, or async? If you skip this, you debug the whole system instead of a slice 2. 𝗧𝗵𝗿𝗲𝗮𝗱 𝗽𝗼𝗼𝗹𝘀 🧵 Active threads, queue size, blocked threads. If all workers are busy, requests are not failing - they are waiting to run. 3. 𝗧𝗵𝗿𝗲𝗮𝗱 𝗱𝘂𝗺𝗽 📸 Look for: * repeated stack traces * WAITING / BLOCKED threads * DB connection waits * socket reads * lock contention This shows where execution is actually stuck. 4. 𝗚𝗖 𝗯𝗲𝗵𝗮𝘃𝗶𝗼𝗿 ♻️ Pause time, frequency, heap pressure. If latency spikes in waves, GC is often involved. 5. 𝗖𝗼𝗻𝗻𝗲𝗰𝘁𝗶𝗼𝗻 𝗽𝗼𝗼𝗹𝘀 🧩 DB, HTTP clients, Redis, broker. Exhausted pool = requests wait instead of fail. Classic “slow but no errors”. 6. 𝗤𝘂𝗲𝘂𝗲𝘀 & 𝗹𝗮𝗴 📊 Queue depth, consumer lag, retries. The system may look fine while work silently accumulates. 7. 𝗗𝗼𝘄𝗻𝘀𝘁𝗿𝗲𝗮𝗺𝘀 🌐 DB, internal services, external APIs. Your service might be slow because it is efficiently waiting on something else. The key shift: No errors does not mean no problem. ❗ It usually means the bottleneck is in waiting, saturation, contention, or backlog. Stop hunting for exceptions first. Start finding where time is spent. How do you usually localize the bottleneck first in this situation? 🤔 #backend #java #springboot #observability #performance #distributedsystems #productionengineering

6 Comments

Dmitry Prudnikov 3w

This LinkedIn post is a classic case of "The Junior's Guide to Senior Debugging." It sounds smart on paper, but following it in a real production outage is a one-way ticket to a 4-hour downtime and an angry CTO. Writing this because you mentioned "teams" and then completely ignore them -) Step 0 is missing: Communication. If "users feel it," you must sync with Support and SREs first. Never troubleshoot in a silo while the ship sinks. The Change Rule: 80% of latency comes from recent changes. Check your Deployment logs and Feature Flags before touching a single Thread Dump. Inverted Priorities: Checking Downstreams (Step 7) should be Step 1. In modern systems, the bottleneck is usually the DB or an external API, not your GC behavior. A better flow for Production: Sync & Declare: Ensure stakeholders are aware. Correlate with Changes: If it matches a deploy → Rollback first, debug later. Check the "Waterfall": Use APM/Tracing to see where time is actually spent (DB, Network, or IO). App Internals: Only dive into Thread Pools and GC once Infra and Downstreams are cleared. The Golden Rule: "No errors" means your timeouts are likely too high. Don't look for what's broken; look for what's waiting. #SRE #Engineering #DevOps #Observability

1 Reaction

Mohammad Sayeed ul Salam 3w

This is spot on — “no errors” scenarios are usually the hardest to debug. One pattern I’ve seen repeatedly in production: 👉 It’s often downstream slowness disguised as application latency We had a case where: - APIs were slow - CPU looked fine - No exceptions Turned out: ➡️ DB connection pool exhaustion + slow queries ➡️ Threads waiting, not failing A couple of things that helped us: - End-to-end tracing (to see where time is actually spent) - Thread dump + pool metrics correlation - Looking at saturation signals (queue depth, connection usage) instead of errors +1 on “stop hunting exceptions first” — that mindset shift is huge. Curious — do you rely more on tracing (Jaeger/Zipkin) or metrics-first when debugging these?

1 Reaction

Sai Pooja 1w

Really resonates — I’ve seen similar cases where everything looked healthy, but latency kept increasing. In one case, it was connection pool exhaustion + a slow downstream call — nothing failed, just more waiting. Breaking down request time helped us spot saturation quickly instead of chasing logs. Agree — these are usually waiting problems, not failures.

1 Reaction

See more comments

To view or add a comment, sign in

More Relevant Posts

Soumya Ranjan Ghadei
1w Edited
Report this post
Testing a Kafka Event-Driven System the Right Way 🛠️ Most test suites are built for the sunny day. But distributed, async systems fail in the shadows — duplicate messages, delayed consumers, silent data corruption. As an SDET, I didn't just test the system. I built a test architecture as resilient as the system itself. Here's how: 1️⃣ Shift-Left & Decouple — Don't Wait for a Full Environment Instead of blocking on a complete deployment, I isolated the Producer, Consumer, and Kafka independently — testing each contract before wiring them together. ✅ API Layer — Schema-strict validation with RestAssured ✅ State Verification — MongoDB probes to confirm the async consumer actually did the work ✅ Performance Gate — <2s threshold for the entire E2E flow 2️⃣ Mirror the Architecture in Your Test Suite The "Golden Path" I designed follows the real data flow: Order API ➡️ Kafka Topic ➡️ Async Python Consumer ➡️ MongoDB The hardest part? Verifying distributed state deterministically, without manual poking or flaky waits. 3️⃣ Kill Flakiness with Testcontainers 🐳 Shared Kafka and DB instances are a reliability trap. My fix: ephemeral infrastructure per test run. 🔄 Fresh containers spun up for every run — zero shared state 🖥️ CI/local parity — "works on my machine" stops being an excuse 🔍 Deep introspection — using kafka-clients and mongodb-driver-sync directly to programmatically peek inside the system The outcome? A high-confidence automated suite that catches what most teams miss: Duplicate message processing Exponential backoff failures Silent consumer lag If your test suite only validates the happy path, you're not testing the system — you're testing your assumptions about it. What's your approach to testing async, event-driven systems? Drop your patterns below 👇 Demo Ecommerce Service:- https://lnkd.in/g72NpF_f Test Framework to test above service:- https://lnkd.in/g2f2Rw7B #SDET #QualityEngineering #TestAutomation #ApacheKafka #Testcontainers #SystemDesign #Java #SoftwareTesting
Like Comment
To view or add a comment, sign in
Harsh P Parnerkar
1w
Report this post
Lessons from Real Backend Systems Short reflections from building and maintaining real backend systems — focusing on Java, distributed systems, and the tradeoffs we don’t talk about enough. ⸻ We had logs everywhere. Still couldn’t explain the outage. At first, it didn’t make sense. Every service was logging. Errors were captured. Dashboards were green just minutes before the failure. But when the system broke, the answers weren’t there. What we had: [Service A Logs] [Service B Logs] [Service C Logs] What we needed: End-to-end understanding of a single request The issue wasn’t lack of data. It was lack of context. Logs told us what happened inside each service. They didn’t tell us how a request moved across the system. That’s when we realized: Observability is not about collecting signals. It’s about connecting them. At scale, debugging requires three perspectives working together: Logs → What happened? Metrics → When and how often? Traces → Where did it happen across services? Without correlation, each signal is incomplete. The turning point was introducing trace context propagation. [Request ID / Trace ID] ↓ Flows across all services ↓ Reconstruct full execution path Now, instead of guessing: * We could trace a failing request across services * Identify latency bottlenecks precisely * Understand failure propagation Architectural insight: Observability should be designed alongside the system — not added after incidents. If you cannot explain how a request flows through your system, you cannot reliably debug it. Takeaway: Logs help you inspect components. Observability helps you understand systems. Which signal do you rely on most during incidents — logs, metrics, or traces? — Writing weekly about backend systems, architectural tradeoffs, and lessons learned through production systems. Keywords: #Observability #DistributedSystems #SystemDesign #BackendEngineering #SoftwareArchitecture #Microservices #Tracing #Monitoring #ScalableSystems
Like Comment
To view or add a comment, sign in
PARTHA MAITY
3w
Report this post
Stop making your users wait! 🛑 Master the 4 ways to handle "Off-Main-Thread" work. The hallmark of a Senior Dev isn't just writing fast code—it's knowing when not to run code on the main thread. 🚀 Background Process: For the "I'll do it now, but you don't have to watch" tasks. (Image resizing, cache warming). ⏰ Cron Job: The "Set it and forget it" worker. It lives by the clock. (Weekly backups, Monday morning reports). 🚜 Batch Job: The "Heavyweight Champion." It handles millions of records while the world is asleep. (Daily bank settlements). 📣 Pub/Sub Queue: The "Orchestrator." It allows different services to talk to each other without being "glued" together. (One event, five different actions). The Golden Rule: If it takes longer than 200ms and the user doesn't need the result to see the next screen, get it off the main thread! What’s your favorite tool for managing these? Hangfire? Quartz? RabbitMQ? AWS Lambda? Let’s talk architecture! 👇 #SystemDesign #BackendDevelopment #SoftwareArchitecture #DotNet #Java #CloudComputing #Microservices #ProgrammingTips
Like Comment
To view or add a comment, sign in
Akshay Kumar
3w
Report this post
☁️ 𝗪𝗵𝗮𝘁 𝗶𝘀 𝟭𝟮-𝗙𝗮𝗰𝘁𝗼𝗿 𝗔𝗽𝗽 𝗺𝗲𝘁𝗵𝗼𝗱𝗼𝗹𝗼𝗴𝘆? The Twelve-Factor App is a set of 12 best practices to build scalable, maintainable, and cloud-ready applications. 🚀 The 12 Factors (Simple + Interview Ready) 𝟭. 📦 𝗖𝗼𝗱𝗲𝗯𝗮𝘀𝗲 👉 One codebase tracked in version control (Git) Multiple deploys (dev, QA, prod) 𝟮. 📚 𝗗𝗲𝗽𝗲𝗻𝗱𝗲𝗻𝗰𝗶𝗲𝘀 👉 Explicitly declare dependencies 𝗨𝘀𝗲 𝗠𝗮𝘃𝗲𝗻 / 𝗚𝗿𝗮𝗱𝗹𝗲 (𝗝𝗮𝘃𝗮) 3. ⚙️ Config 👉 Store config in environment variables No hardcoding Example: DB URL, API keys 𝟰. 🧩 𝗕𝗮𝗰𝗸𝗶𝗻𝗴 𝗦𝗲𝗿𝘃𝗶𝗰𝗲𝘀 👉 Treat DB, cache, messaging as attached resources Easily replaceable (e.g., MySQL → PostgreSQL) 𝟱. 🔨 𝗕𝘂𝗶𝗹𝗱, 𝗥𝗲𝗹𝗲𝗮𝘀𝗲, 𝗥𝘂𝗻 👉 Separate: Build → compile Release → config Run → execution 𝟲. 🧱 𝗣𝗿𝗼𝗰𝗲𝘀𝘀𝗲𝘀 👉 App runs as stateless processes No session stored in memory 𝟳. 🌐 𝗣𝗼𝗿𝘁 𝗕𝗶𝗻𝗱𝗶𝗻𝗴 👉 App exposes service via port Example: Spring Boot runs on 8080 𝟴. ⚡ 𝗖𝗼𝗻𝗰𝘂𝗿𝗿𝗲𝗻𝗰𝘆 👉 Scale via multiple processes Horizontal scaling (more instances) 𝟵. 🔄 𝗗𝗶𝘀𝗽𝗼𝘀𝗮𝗯𝗶𝗹𝗶𝘁𝘆 👉 Fast startup & graceful shutdown Important for containers (Docker) 𝟭𝟬. 🧪 𝗗𝗲𝘃/𝗣𝗿𝗼𝗱 𝗣𝗮𝗿𝗶𝘁𝘆 👉 Keep dev, QA, prod environments similar Avoid “works on my machine” issues 𝟭𝟭. 📊 𝗟𝗼𝗴𝘀 👉 Treat logs as event streams Don’t store locally → use ELK / Splunk 𝟭𝟮. 🛠️ 𝗔𝗱𝗺𝗶𝗻 𝗣𝗿𝗼𝗰𝗲𝘀𝘀𝗲𝘀 👉 Run admin tasks as one-off processes Example: DB migration scripts 🎯 Interview Short Answer The 12-factor app is a methodology for building cloud-native applications. It includes principles like using a single codebase, managing configurations via environment variables, keeping applications stateless, enabling horizontal scaling, and maintaining dev-prod parity to ensure scalability and maintainability. #systendesign #cloud #java
Like Comment
To view or add a comment, sign in
Harish Tiwari
3w
Report this post
A solid reminder of how building scalable, maintainable, and cloud-native applications requires strong foundational principles. From configuration management to stateless processes and efficient dependency handling — every factor plays a crucial role in modern application design. Really valuable insights for anyone working on distributed systems or cloud-based architectures. Kudos to the author for putting together such a practical and insightful post! 🚀 #SoftwareEngineering #CloudNative #12FactorApp #SystemDesign #Microservices #DevOps #Architecture
Akshay Kumar

Java Architect | Spring Boot | Microservices | Kafka | AWS Certified | AI/ML | 12+ Years
3w

☁️ 𝗪𝗵𝗮𝘁 𝗶𝘀 𝟭𝟮-𝗙𝗮𝗰𝘁𝗼𝗿 𝗔𝗽𝗽 𝗺𝗲𝘁𝗵𝗼𝗱𝗼𝗹𝗼𝗴𝘆? The Twelve-Factor App is a set of 12 best practices to build scalable, maintainable, and cloud-ready applications. 🚀 The 12 Factors (Simple + Interview Ready) 𝟭. 📦 𝗖𝗼𝗱𝗲𝗯𝗮𝘀𝗲 👉 One codebase tracked in version control (Git) Multiple deploys (dev, QA, prod) 𝟮. 📚 𝗗𝗲𝗽𝗲𝗻𝗱𝗲𝗻𝗰𝗶𝗲𝘀 👉 Explicitly declare dependencies 𝗨𝘀𝗲 𝗠𝗮𝘃𝗲𝗻 / 𝗚𝗿𝗮𝗱𝗹𝗲 (𝗝𝗮𝘃𝗮) 3. ⚙️ Config 👉 Store config in environment variables No hardcoding Example: DB URL, API keys 𝟰. 🧩 𝗕𝗮𝗰𝗸𝗶𝗻𝗴 𝗦𝗲𝗿𝘃𝗶𝗰𝗲𝘀 👉 Treat DB, cache, messaging as attached resources Easily replaceable (e.g., MySQL → PostgreSQL) 𝟱. 🔨 𝗕𝘂𝗶𝗹𝗱, 𝗥𝗲𝗹𝗲𝗮𝘀𝗲, 𝗥𝘂𝗻 👉 Separate: Build → compile Release → config Run → execution 𝟲. 🧱 𝗣𝗿𝗼𝗰𝗲𝘀𝘀𝗲𝘀 👉 App runs as stateless processes No session stored in memory 𝟳. 🌐 𝗣𝗼𝗿𝘁 𝗕𝗶𝗻𝗱𝗶𝗻𝗴 👉 App exposes service via port Example: Spring Boot runs on 8080 𝟴. ⚡ 𝗖𝗼𝗻𝗰𝘂𝗿𝗿𝗲𝗻𝗰𝘆 👉 Scale via multiple processes Horizontal scaling (more instances) 𝟵. 🔄 𝗗𝗶𝘀𝗽𝗼𝘀𝗮𝗯𝗶𝗹𝗶𝘁𝘆 👉 Fast startup & graceful shutdown Important for containers (Docker) 𝟭𝟬. 🧪 𝗗𝗲𝘃/𝗣𝗿𝗼𝗱 𝗣𝗮𝗿𝗶𝘁𝘆 👉 Keep dev, QA, prod environments similar Avoid “works on my machine” issues 𝟭𝟭. 📊 𝗟𝗼𝗴𝘀 👉 Treat logs as event streams Don’t store locally → use ELK / Splunk 𝟭𝟮. 🛠️ 𝗔𝗱𝗺𝗶𝗻 𝗣𝗿𝗼𝗰𝗲𝘀𝘀𝗲𝘀 👉 Run admin tasks as one-off processes Example: DB migration scripts 🎯 Interview Short Answer The 12-factor app is a methodology for building cloud-native applications. It includes principles like using a single codebase, managing configurations via environment variables, keeping applications stateless, enabling horizontal scaling, and maintaining dev-prod parity to ensure scalability and maintainability. #systendesign #cloud #java
Like Comment
To view or add a comment, sign in
Pradip Nair
1w Edited
Report this post
Over the past few months, our team has been facing a reality many engineering teams know well: frequent performance incidents, daily escalations, and growing technical debt in a backend that was never designed to handle heavy load. #Java #BackendDevelopment #SystemDesign #PerformanceEngineering #SQL #Scalability #SoftwareArchitecture #TechLeadership

1 Comment
Like Comment
To view or add a comment, sign in
Vighneshwar Reddy Vangala
2w Edited
Report this post
𝗦𝗽𝗿𝗶𝗻𝗴 𝗕𝗼𝗼𝘁 𝗔𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲 – 𝗛𝗶𝗴𝗵 𝗹𝗲𝘃𝗲𝗹 𝗣𝗿𝗮𝗰𝘁𝗶𝗰𝗮𝗹 𝗕𝗿𝗲𝗮𝗸𝗱𝗼𝘄𝗻 When building scalable backend systems, having a clear architectural understanding of Spring Boot is a game changer. Here’s a simple yet powerful way to think about it 👇 𝗖𝗼𝗿𝗲 𝗟𝗮𝘆𝗲𝗿 (𝗙𝗼𝘂𝗻𝗱𝗮𝘁𝗶𝗼𝗻) This is where everything starts. • Auto-Configuration – Reduces boilerplate, smart defaults • Dependency Injection – Loose coupling, easier testing • Application Context – Heart of Spring, manages beans lifecycle 👉 This layer makes Spring Boot “plug & play” 𝗪𝗲𝗯 𝗟𝗮𝘆𝗲𝗿 (𝗘𝗻𝘁𝗿𝘆 𝗣𝗼𝗶𝗻𝘁) Handles all incoming traffic. • REST Controllers – Expose APIs • Request Mapping – Route requests effectively • Validation – Ensure clean & safe inputs 👉 This is where your APIs meet the world 𝗗𝗮𝘁𝗮 𝗟𝗮𝘆𝗲𝗿 (𝗣𝗲𝗿𝘀𝗶𝘀𝘁𝗲𝗻𝗰𝗲) Responsible for data handling. • Spring Data JPA – Abstracts DB interactions • Repositories – Clean data access layer • Transactions – Ensure consistency & reliability 👉 Focus: Integrity + performance 𝗦𝗲𝗰𝘂𝗿𝗶𝘁𝘆 𝗟𝗮𝘆𝗲𝗿 (𝗣𝗿𝗼𝘁𝗲𝗰𝘁𝗶𝗼𝗻) Because production ≠ demo apps. • JWT Authentication – Stateless & scalable • Role-Based Access Control (RBAC) – Fine-grained permissions 👉 Secure by design, not as an afterthought 𝗢𝗯𝘀𝗲𝗿𝘃𝗮𝗯𝗶𝗹𝗶𝘁𝘆 (𝗣𝗿𝗼𝗱𝘂𝗰𝘁𝗶𝗼𝗻 𝗥𝗲𝗮𝗱𝗶𝗻𝗲𝘀𝘀) What you don’t measure, you can’t improve. • Actuator – Health & metrics endpoints • Prometheus – Metrics collection • Grafana – Visualization & alerts 👉 This is where real engineering begins 𝙁𝙞𝙣𝙖𝙡 𝙏𝙝𝙤𝙪𝙜𝙝𝙩: A good Spring Boot application isn’t just about writing controllers — it’s about designing layers that are scalable, secure, and observable. #SpringBoot #Java #BackendDevelopment #Microservices #SystemDesign #SoftwareArchitecture #DevOps #Observability #JWT #SpringFramework #CodeQuality #TechLeadership #codefarm
Like Comment
To view or add a comment, sign in
Yamin MECHQI
3w
Report this post
Logs are not observability Many teams think they have observability because they have logs. That’s not enough. When a production issue happens, I want to know: - Which endpoint degraded? - Which dependency is slow? - Which service is failing? - Which customer flow is impacted? - Where exactly the request broke? That means I need more than logs. I need: - metrics - tracing - health signals - correlation IDs - alerting In distributed systems, this becomes non-negotiable. Because once requests travel through: - API - service layer - DB - Kafka - external integrations - ... Debugging without observability becomes pure guesswork. A backend that cannot be observed Cannot be operated professionally. #Observability #OpenTelemetry #Micrometer #Java #SpringBoot #SRE #Backend
1 Comment
Like Comment
To view or add a comment, sign in
Vishnu V Reddy
3w
Report this post
When designing a distributed system, where do you stand on the Rest vs. GraphQL debate? I’m seeing more enterprise teams move toward GraphQL to solve the 'over-fetching' problem , but REST remains the industry standard for its simplicity and cacheability. The Question: For a high-traffic system requiring real-time data orchestration, would you prioritize the strict contract of REST or the flexibility of GraphQL? Drop your thoughts below! 👇 #SystemDesign #SoftwareArchitecture #Java #Microservices #BackendDevelopment"
Like Comment
To view or add a comment, sign in
Luca Tagliaferri
6d
Report this post
Source: https://lnkd.in/eTi_MWqa 🚀 28% Code Coverage Boost Without Writing a Single Test 🚀 Tom Noah’s approach to restructuring data models (Java records, @Value annotations) cut redundant code and boosted coverage naturally. Instead of bloating tests, he focused on system design—smart move! 💡 Key Takeaways: - Auto-generated code distorts metrics; removing it fixes the root issue. - Immutable patterns + Builder pattern = cleaner, thread-safe systems. - Less testing = faster CI pipelines and fewer false contracts. 🔧 Pro Tip: Audit your data models—redundant code is a hidden bottleneck! #CodeQuality #DevOps
Like Comment
To view or add a comment, sign in

777 followers

125 Posts

View Profile Follow

Ruslan Mukhamadiarov’s Post

More Relevant Posts

Explore content categories