EVERYONE LOVES ASYNC QUEUES UNTIL THEY CLOG. Implementing Message Queues (like RabbitMQ or Celery) is a massive milestone in backend architecture. It feels like magic: you offload heavy tasks, and your API response times drop to milliseconds. But I quickly learned that distributed systems have a dark side: The Poison Message. Here is the scenario: Your API accepts a user's file and drops a "Process File" task into the queue. Your background worker picks it up. But the file is corrupted. The worker crashes and throws an exception. Because queues are designed to be reliable, the system assumes it was just a temporary network glitch. So, it puts the message back into the queue. Another worker picks it up. It crashes again. Suddenly, your queue is stuck in an infinite loop of death. This one "Poison" message eats up all your CPU cycles, and the thousands of healthy messages behind it are completely blocked. Your system is effectively down. The Solution: The Dead Letter Queue (DLQ). A DLQ is an architectural safety net. You configure your main queue with a strict rule: "If a message fails 3 times, stop trying." Instead of putting it back in the main line, the system routes the failing message to a dedicated "Graveyard" queue (the DLQ). 1. The Main Pipe Stays Clean: Healthy messages continue to process at full speed. 2. Zero Data Loss: The failed task isn't deleted. It sits safely in the DLQ. 3. Easy Debugging: As an engineer, I can open the DLQ later, inspect the exact payload that caused the crash, fix the bug in my code, and "replay" the dead messages. It is the difference between an application that breaks catastrophically and one that degrades gracefully. For the backend engineers handling high throughput: Do you set up automated alerts for your DLQs, or do you manually inspect them during your weekly maintenance? #SystemDesign #BackendArchitecture #MessageQueue #RabbitMQ #Microservices #Reliability #SoftwareEngineering #Python
Preventing Poison Messages in Message Queues with Dead Letter Queues
More Relevant Posts
-
We avoided the distributed system trap by keeping our CQRS implementation boring. I insisted on using a single RDBMS with separate schemas for commands and queries. The team wanted a shiny event-sourced architecture with Kafka and a document store. They called my approach "monolithic" and "outdated" for a high-scale system. Two years later, a massive marketing campaign tripled our traffic overnight. While our competitors struggled with eventual consistency bugs, we just added two read replicas. The write side stayed fast because it wasn't burdened by complex joins or lock contention. We didn't need a complex event bus to handle the load 📉. The system just needed to stop locking the write tables during heavy reporting. Choosing a boring path meant we spent that weekend sleeping instead of debugging race conditions ☕. What I learned: Architectural separation is about logical boundaries, not necessarily adding more moving parts. ♻️ Repost to save someone from learning this the hard way. ➕ Follow Pankaj Kumar for backend engineering lessons earned in production. #backend #engineering #software #java
To view or add a comment, sign in
-
#HLD #SystemDesign #Scaling 𝐖𝐞 𝐝𝐢𝐝𝐧’𝐭 𝐡𝐚𝐯𝐞 𝐚 𝐬𝐜𝐚𝐥𝐢𝐧𝐠 𝐩𝐥𝐚𝐧… 𝐮𝐧𝐭𝐢𝐥 𝐭𝐡𝐞 𝐬𝐲𝐬𝐭𝐞𝐦 𝐬𝐭𝐚𝐫𝐭𝐞𝐝 𝐛𝐫𝐞𝐚𝐤𝐢𝐧𝐠 Most architectures look clean in diagrams. In production, they evolve under pressure. Over the next 8 days, I’m breaking down how systems actually scale from 1 user to 1 million users. No fluff. Only real bottlenecks and production fixes 𝐃𝐚𝐲 𝟏 𝐌𝐨𝐧𝐨𝐥𝐢𝐭𝐡 𝟏 𝐭𝐨 𝟏𝟎𝟎 𝐮𝐬𝐞𝐫𝐬 Everything runs on one machine Simple, fast, fragile 𝐃𝐚𝐲 𝟐 𝐃𝐚𝐭𝐚𝐛𝐚𝐬𝐞 𝐒𝐞𝐩𝐚𝐫𝐚𝐭𝐢𝐨𝐧 𝟏𝟎𝟎 𝐭𝐨 𝟏𝐊 App and DB fighting for resources First real bottleneck appears 𝐃𝐚𝐲 𝟑 𝐋𝐨𝐚𝐝 𝐁𝐚𝐥𝐚𝐧𝐜𝐢𝐧𝐠 𝟏𝐊 𝐭𝐨 𝟏𝟎𝐊 One server becomes a risk Horizontal scaling begins 𝐃𝐚𝐲 𝟒 𝐂𝐚𝐜𝐡𝐢𝐧𝐠 𝟏𝟎𝐊 𝐭𝐨 𝟏𝟎𝟎𝐊 Database starts collapsing under reads Caching changes everything 𝐃𝐚𝐲 𝟓 𝐀𝐬𝐲𝐧𝐜 𝐒𝐲𝐬𝐭𝐞𝐦𝐬 𝟏𝟎𝟎𝐊 𝐭𝐨 𝟓𝟎𝟎𝐊 Sync calls cause timeouts Queues bring stability 𝐃𝐚𝐲 𝟔 𝐃𝐚𝐭𝐚𝐛𝐚𝐬𝐞 𝐒𝐜𝐚𝐥𝐢𝐧𝐠 𝟓𝟎𝟎𝐊 𝐭𝐨 𝟏𝐌 Writes become the bottleneck Replication and sharding enter 𝐃𝐚𝐲 𝟕 𝐌𝐢𝐜𝐫𝐨𝐬𝐞𝐫𝐯𝐢𝐜𝐞𝐬 𝐚𝐭 𝐒𝐜𝐚𝐥𝐞 Teams slow down monolith growth Services unlock speed 𝐃𝐚𝐲 𝟖 𝐎𝐛𝐬𝐞𝐫𝐯𝐚𝐛𝐢𝐥𝐢𝐭𝐲 Failures become invisible Monitoring becomes survival 𝐓𝐡𝐢𝐬 𝐬𝐞𝐫𝐢𝐞𝐬 𝐢𝐬 𝐝𝐢𝐟𝐟𝐞𝐫𝐞𝐧𝐭 • No over engineering from day one • No theoretical diagrams • Only real production problems and fixes • Built from backend engineering experience Follow along for the next 8 days #SystemDesign #BackendEngineering #Scalability #Microservices #Java #SpringBoot #DistributedSystems #BuildInPublic #SoftwareEngineering
To view or add a comment, sign in
-
A reality about modern backend systems: Your system is only as fast as your slowest dependency. You can optimize your code. Tune your database. Scale your services. But if one dependency is slow… Everything feels slow. In distributed systems, a single request often goes through: API Gateway → Service A → Service B → Database → External API That’s multiple hops. And latency adds up. That’s why experienced engineers focus on: 🔹 Reducing unnecessary service calls 🔹 Using caching strategically 🔹 Adding timeouts to every external dependency 🔹 Avoiding deep service chains 🔹 Monitoring latency across each layer Because performance is not just about speed. It’s about consistency. Users don’t notice when your system is fast. They notice when it’s unpredictably slow. The goal is not just low latency. It’s predictable latency. That’s what makes systems feel reliable. Where do you usually see latency bottlenecks in your architecture? #softwareengineering #java #backend #microservices #systemdesign #performance #devops #engineering #tech
To view or add a comment, sign in
-
Designing asynchronous systems that "don’t break" is not about adding Kafka or RabbitMQ and calling it a day. It’s about accepting one uncomfortable truth: "Things will fail. Messages will be late. Systems will be inconsistent." And designing anyway. Early in my career, I thought async meant “faster and scalable.” In reality, it meant… "harder to reason about." You send an event → you don’t control who consumes it You retry a message → you might duplicate data You scale consumers → you introduce race conditions That’s when it clicked: Asynchronous systems are less about speed… and more about ""discipline"". Here’s what actually makes them "not break": Idempotency is non-negotiable If processing the same event twice breaks your system, it’s already broken. Retries need boundaries Blind retries can create more damage than failures. Dead-letter queues exist for a reason. Observability is your lifeline If you can’t trace an event across services, you’re debugging in the dark. Event design matters more than code A poorly designed event schema will haunt you longer than any bug. Eventual consistency is a mindset shift Not everything needs to be correct "right now", but it must become correct "eventually". The biggest shift? Moving from "request-response thinking" To "event-driven thinking" You stop asking: "Did this API succeed?" And start asking: "Will the system converge to the right state?" That’s a very different game. After working on distributed systems across healthcare, banking, and retail, one thing is clear: ""Resilience isn’t a feature you add later. It’s a decision you make at design time."" What’s one async failure that taught you a hard lesson? #SystemDesign #DistributedSystems #EventDrivenArchitecture #Microservices #Kafka #AsyncProgramming #BackendEngineering #SoftwareArchitecture #CloudEngineering #Scalability #Resilience #Java #SpringBoot #TechLeadership #EngineeringLessons
To view or add a comment, sign in
-
-
𝐈 𝐣𝐮𝐬𝐭 𝐝𝐞𝐞𝐩-𝐝𝐢𝐯𝐞𝐝 𝐢𝐧𝐭𝐨 𝐒𝐲𝐬𝐭𝐞𝐦 𝐃𝐞𝐬𝐢𝐠𝐧 — 𝐡𝐞𝐫𝐞 𝐚𝐫𝐞 𝐭𝐡𝐞 10 𝐜𝐨𝐧𝐜𝐞𝐩𝐭𝐬 𝐞𝐯𝐞𝐫𝐲 𝐃𝐞𝐯𝐎𝐩𝐬/𝐁𝐚𝐜𝐤𝐞𝐧𝐝 𝐞𝐧𝐠𝐢𝐧𝐞𝐞𝐫 𝐌𝐔𝐒𝐓 𝐤𝐧𝐨𝐰 𝐢𝐧 2026. (𝘚𝘢𝘷𝘪𝘯𝘨 𝘵𝘩𝘪𝘴 𝘸𝘪𝘭𝘭 𝘴𝘢𝘷𝘦 𝘺𝘰𝘶 𝘩𝘰𝘶𝘳𝘴 𝘰𝘧 𝘳𝘦𝘴𝘦𝘢𝘳𝘤𝘩) 1. 𝐃𝐢𝐬𝐭𝐫𝐢𝐛𝐮𝐭𝐞𝐝 𝐒𝐲𝐬𝐭𝐞𝐦𝐬 A single server will eventually CPU max-out, run out of memory, or fail. Distributed systems solve this — but introduce network latency, partial failures & consistency challenges. 2. 𝐂𝐀𝐏 𝐓𝐡𝐞𝐨𝐫𝐞𝐦 You can only guarantee 2 of 3: Consistency, Availability, Partition Tolerance. In reality? You ALWAYS need P. So the real choice is CP vs AP. → Banks choose CP. Social feeds choose AP. 3. 𝐌𝐨𝐧𝐨𝐥𝐢𝐭𝐡 → 𝐌𝐢𝐜𝐫𝐨𝐬𝐞𝐫𝐯𝐢𝐜𝐞𝐬 One bug in the payment module shouldn't crash your entire app. Microservices fix this — but bring network overhead & distributed complexity as trade-offs. 4. 𝐠𝐑𝐏𝐂 𝐯𝐬 𝐑𝐄𝐒𝐓 REST (JSON) = human-readable, universal. gRPC (Protobuf + HTTP/2) = 3–10x faster, supports streaming. → Use REST for public APIs. Use gRPC for internal service-to-service calls. 5. 𝐊𝐚𝐟𝐤𝐚 𝐯𝐬 𝐑𝐚𝐛𝐛𝐢𝐭𝐌𝐐 Kafka = event streaming, message replay, millions/sec → for analytics pipelines. RabbitMQ = task queues, complex routing → for job processing. Don't mix them up. 6. 𝐒𝐞𝐫𝐯𝐢𝐜𝐞 𝐌𝐞𝐬𝐡 (𝐈𝐬𝐭𝐢𝐨) Load balancers route traffic. Service Meshes (Istio + Envoy sidecar) go further: 1. mTLS between every service 2. Circuit breaking 3. Distributed tracing All without touching your application code. 7. 𝐊𝐄𝐃𝐀 — 𝐀𝐮𝐭𝐨𝐬𝐜𝐚𝐥𝐢𝐧𝐠 𝐝𝐨𝐧𝐞 𝐫𝐢𝐠𝐡𝐭 HPA scales on CPU/Memory. KEDA scales on EVENTS — Kafka lag, SQS depth, Prometheus metrics. It can even scale to ZERO pods when idle. → Massive cost savings on batch workloads. 8. 𝐉𝐖𝐓 𝐒𝐞𝐜𝐮𝐫𝐢𝐭𝐲 𝐓𝐫𝐚𝐩 JWT payloads are Base64 encoded — NOT encrypted. Anyone can decode them. Never store sensitive data in a JWT payload. 9. 𝐆𝐢𝐭𝐎𝐩𝐬 𝐰𝐢𝐭𝐡 𝐀𝐫𝐠𝐨𝐂𝐃 ArgoCD continuously watches your Git repo. Any drift in the cluster? It auto-reverts to match Git. → Full audit trail. Easy rollbacks. Cluster never directly exposed to CI pipelines. 10.𝐂𝐚𝐧𝐚𝐫𝐲 𝐃𝐞𝐩𝐥𝐨𝐲𝐦𝐞𝐧𝐭𝐬 (𝐇𝐨𝐰 𝐍𝐞𝐭𝐟𝐥𝐢𝐱 𝐝𝐨𝐞𝐬 𝐢𝐭) Don't release to 100% users at once. 5% → monitor → 10% → monitor → 25% → 100% Argo Rollouts can automate this WITH automatic rollback if error rate spikes. A huge shoutout & thank you to Vishakha Sadhwani for putting together such an incredible YouTube video on System Design! Her content makes complex DevOps & System Design concepts genuinely easy to understand — and the best part? She has also kicked off a hands-on project alongside it, so you can learn AND build at the same time. 𝐕𝐢𝐝𝐞𝐨 𝐑𝐞𝐟𝐞𝐫𝐞𝐧𝐜𝐞: https://lnkd.in/gWV5G_xW #DevOps #SystemDesign #Kubernetes #Microservices #CloudEngineering #AWS #GitOps #BackendEngineering #SoftwareArchitecture #KEDA #ArgoCD #LearningInPublic #VishakhaSadhwani
To view or add a comment, sign in
-
𝐋𝐚𝐭𝐞𝐧𝐜𝐲 𝐢𝐬 𝐮𝐩, 𝐛𝐮𝐭 𝐧𝐨𝐭𝐡𝐢𝐧𝐠 𝐢𝐬 𝐟𝐚𝐢𝐥𝐢𝐧𝐠. 𝐖𝐡𝐞𝐫𝐞 𝐝𝐨 𝐲𝐨𝐮 𝐥𝐨𝐨𝐤 𝐟𝐢𝐫𝐬𝐭? 🔍 One of the worst production situations: Latency is growing 📈 Users feel it 😐 Logs are clean 🧼 Nothing is obviously broken ❌ Most teams waste time here. They search for errors 🔎 Restart pods 🔄 Jump between dashboards 📊 But when nothing is failing, the problem is rarely an exception. It is usually one of these: 1. 𝗦𝗰𝗼𝗽𝗲 𝗳𝗶𝗿𝘀𝘁 🎯 One endpoint or all? One instance or all? Reads, writes, or async? If you skip this, you debug the whole system instead of a slice 2. 𝗧𝗵𝗿𝗲𝗮𝗱 𝗽𝗼𝗼𝗹𝘀 🧵 Active threads, queue size, blocked threads. If all workers are busy, requests are not failing - they are waiting to run. 3. 𝗧𝗵𝗿𝗲𝗮𝗱 𝗱𝘂𝗺𝗽 📸 Look for: * repeated stack traces * WAITING / BLOCKED threads * DB connection waits * socket reads * lock contention This shows where execution is actually stuck. 4. 𝗚𝗖 𝗯𝗲𝗵𝗮𝘃𝗶𝗼𝗿 ♻️ Pause time, frequency, heap pressure. If latency spikes in waves, GC is often involved. 5. 𝗖𝗼𝗻𝗻𝗲𝗰𝘁𝗶𝗼𝗻 𝗽𝗼𝗼𝗹𝘀 🧩 DB, HTTP clients, Redis, broker. Exhausted pool = requests wait instead of fail. Classic “slow but no errors”. 6. 𝗤𝘂𝗲𝘂𝗲𝘀 & 𝗹𝗮𝗴 📊 Queue depth, consumer lag, retries. The system may look fine while work silently accumulates. 7. 𝗗𝗼𝘄𝗻𝘀𝘁𝗿𝗲𝗮𝗺𝘀 🌐 DB, internal services, external APIs. Your service might be slow because it is efficiently waiting on something else. The key shift: No errors does not mean no problem. ❗ It usually means the bottleneck is in waiting, saturation, contention, or backlog. Stop hunting for exceptions first. Start finding where time is spent. How do you usually localize the bottleneck first in this situation? 🤔 #backend #java #springboot #observability #performance #distributedsystems #productionengineering
To view or add a comment, sign in
-
-
🚀 𝗗𝗶𝘀𝘁𝗿𝗶𝗯𝘂𝘁𝗲𝗱 𝗦𝘆𝘀𝘁𝗲𝗺𝘀 𝗗𝗲𝘀𝗶𝗴𝗻 𝗶𝘀 𝗻𝗼𝘁 𝗮𝗯𝗼𝘂𝘁 𝘀𝗽𝗹𝗶𝘁𝘁𝗶𝗻𝗴 𝘀𝗲𝗿𝘃𝗶𝗰𝗲𝘀. 𝗜𝘁 𝗶𝘀 𝗮𝗯𝗼𝘂𝘁 𝘀𝘂𝗿𝘃𝗶𝘃𝗶𝗻𝗴 𝗽𝗿𝗼𝗱𝘂𝗰𝘁𝗶𝗼𝗻. After working on large-scale microservices in production, one thing becomes clear very quickly Most systems don’t fail because of code. They fail because of design decisions under load. Here’s what actually matters in real systems 👇 🔹𝗟𝗮𝘁𝗲𝗻𝗰𝘆 𝗶𝘀 𝘆𝗼𝘂𝗿 𝗳𝗶𝗿𝘀𝘁 𝗲𝗻𝗲𝗺𝘆 Network calls are expensive. A “simple” service chain can kill performance. Design with fewer hops, not more services. 🔹𝗙𝗮𝗶𝗹𝘂𝗿𝗲 𝗶𝘀 𝗴𝘂𝗮𝗿𝗮𝗻𝘁𝗲𝗲𝗱, 𝗻𝗼𝘁 𝗼𝗽𝘁𝗶𝗼𝗻𝗮𝗹 Services will go down. Always. If you are not using retries, circuit breakers, and fallbacks, your system is fragile. 🔹𝗗𝗮𝘁𝗮 𝗰𝗼𝗻𝘀𝗶𝘀𝘁𝗲𝗻𝗰𝘆 𝗶𝘀 𝗮 𝘁𝗿𝗮𝗱𝗲-𝗼𝗳𝗳 Strong consistency sounds good in theory. In distributed systems, eventual consistency is often the only scalable option. 🔹𝗔𝘀𝘆𝗻𝗰 > 𝗦𝘆𝗻𝗰 𝗶𝗻 𝗵𝗶𝗴𝗵-𝘀𝗰𝗮𝗹𝗲 𝘀𝘆𝘀𝘁𝗲𝗺𝘀 Kafka or event-driven flows reduce tight coupling and improve resilience. Synchronous chains look clean but break faster under pressure. 🔹𝗢𝗯𝘀𝗲𝗿𝘃𝗮𝗯𝗶𝗹𝗶𝘁𝘆 𝗶𝘀 𝗻𝗼𝘁 𝗮 𝘁𝗼𝗼𝗹, 𝗶𝘁'𝘀 𝗮 𝗺𝗶𝗻𝗱𝘀𝗲𝘁 Logs, metrics, tracing. Without visibility, debugging production is guesswork. 🔹𝗖𝗮𝗰𝗵𝗶𝗻𝗴 𝗶𝘀 𝗻𝗼𝘁 𝗮𝗻 𝗼𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻, 𝗶𝘁'𝘀 𝗮 𝗻𝗲𝗰𝗲𝘀𝘀𝗶𝘁𝘆 Redis or in-memory caching can reduce load drastically when designed right. 💡𝗧𝗵𝗲 𝗯𝗶𝗴𝗴𝗲𝘀𝘁 𝘀𝗵𝗶𝗳𝘁 𝗶𝗻 𝘁𝗵𝗶𝗻𝗸𝗶𝗻𝗴: You stop designing “applications” and start designing “systems that can handle failure, scale, and unpredictability” Most engineers learn frameworks Very few learn how systems behave in production What’s one design mistake you’ve seen that caused a real production issue? #DistributedSystems #Microservices #SystemDesign #Java #SpringBoot #Kafka #AWS #SoftwareEngineering
To view or add a comment, sign in
-
-
Building scalable systems means ensuring your API remains fast and responsive — even when handling long-running or compute-heavy tasks like OCR, ML inference, or file processing. In my latest write-up, I’ve explained how to design an asynchronous processing system using- FastAPI for handling API requests RabbitMQ as the message broker Celery for background task processing The post covers- 1. End-to-end architecture explanation 2. Initial setup guide (Windows) 3. A complete working demo project with folder structure 4. Sample FastAPI + Celery integration 5. Step-by-step testing using Swagger and worker logs If you're working on systems where background processing is critical, this setup can significantly improve performance and scalability. P.S. If you’re interested in setting up the same architecture in a production environment — including IIS hosting, port configuration, and running RabbitMQ, Celery, and FastAPI as services — feel free to reach out. #FastAPI #Celery #RabbitMQ #BackendDevelopment #SystemDesign #AsyncProcessing #Python #ScalableSystems #Microservices #SoftwareEngineering
To view or add a comment, sign in
-
Built a production-grade distributed task queue — focused on what actually breaks in real systems. Most queue systems work fine until retries, failures, and unpredictable workloads hit. So I designed a system with reliability first: • At-least-once delivery with processing tracking • Idempotent submissions using Redis SET NX • Atomic queue operations via Lua (no race conditions) • Retry with exponential backoff + jitter (avoids retry storms) • Dead-letter queue for failure isolation Then added an ML layer: → Predicts job duration and improves scheduling → Falls back to FIFO if ML service fails So the system stays correct even when the “smart” layer breaks. I also focused heavily on observability: • Prometheus metrics (latency, retries, queue depth) • Grafana dashboards for real-time visibility Benchmarks: • 100% success rate under load • P95 submission latency ~11ms • P95 job completion latency ~2.0s • ML scheduling improved avg latency by ~5.4% Tech: Java 17, Spring Boot, Redis, Lua, FastAPI, Docker, Kubernetes, Prometheus, Grafana, Python GitHub: https://lnkd.in/gGjKZFap Currently exploring backend / distributed systems roles. Open to connecting with teams building at scale. #backend #distributedsystems #java #systemdesign #softwareengineering #python
To view or add a comment, sign in
-
-
One thing I’ve learned the hard way: “If an API works fast locally, it means nothing.” I worked on an API that looked perfect in testing: • <100ms response time • Clean implementation • No visible issues But under real traffic, latency started spiking: • 100ms → 800ms → 2s+ • Occasional timeouts • Downstream impact No errors. No crashes. Just slow degradation. That’s where most people get stuck. Breaking it down: Logs looked clean JVM and CPU were stable DB started showing increased load Digging deeper: • Found repeated DB calls for the same data (N+1 pattern) • No effective caching for high-frequency requests Fix wasn’t scaling infra. It was fixing the design: • Eliminated redundant DB calls • Added indexing on frequently queried columns • Introduced Redis caching with controlled TTL • Avoided caching user-specific data to prevent stale responses Result: Latency dropped from ~2s to <200ms under load DB load reduced significantly System handled higher traffic without scaling aggressively Reality: Performance problems don’t show up in code reviews. They show up when your system is under pressure. If you’re not testing for that, you’re not building production-ready systems. #Java #SpringBoot #Performance #Microservices #BackendEngineering #SystemDesign
To view or add a comment, sign in
-
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development