Debugging Docker Containers in Production with Docker Logs

1mo

How I Debug Docker Containers in Production Running containers is easy. Debugging them in production is where real engineering starts. In the beginning, whenever something broke on my VPS, I used to panic. • API not responding • container running but endpoint failing • frontend showing errors • database connection issues At first, I thought: “Maybe my code is wrong.” But over time I learned: In production, issues are not always about code — they are about environment, logs, and system behavior. Here’s the exact debugging approach I follow now: 📌 1. Check running containers docker ps Is the container even running? If not → it’s not a code issue, it’s a startup issue. 📌 2. Check logs (most important step) docker logs container_name This gives real insight. Most of my issues were solved here: • missing env variables • database connection errors • port conflicts • runtime crashes 📌 3. Go inside the container docker exec -it container_name sh Now I debug like it’s a real server: • check files • test API locally • verify environment variables • inspect running processes 📌 4. Check docker-compose & env Many times the issue was: • wrong .env value • missing config • wrong service name Not code — just configuration mismatch. 📌 5. Restart & rebuild when needed docker compose down docker compose up -d --build Sometimes containers need a clean restart. After facing multiple real issues, I understood something important: Logs are your best friend in production. Not guessing. Not assumptions. Just read what the system is telling you. Lesson: A good developer writes code. A strong engineer knows how to debug systems. In the next post, I’ll share common Docker mistakes I made that cost me time in production. #Docker #DevOps #SoftwareEngineering #Debugging #VPS #BuildInPublic

To view or add a comment, sign in

More Relevant Posts

Ayoub Chaieb
1mo
Report this post
Every line in a Dockerfile is a deliberate decision. Most people write them without knowing why. A Dockerfile is not a shell script. It is a set of immutable, cached, layered instructions that build a reproducible image. Understanding the difference changes how you write them. Let me walk through the decisions that matter most. FROM node:14 This is not just "I need Node." It is your entire foundation. The base image determines what OS, what shell, what system libraries your container inherits. Choose it deliberately. ENV NODE_ENV=production Bake configuration into the image at build time so the container needs no external setup at runtime. This is the opposite of configuration drift. WORKDIR /usr/src/app Every subsequent instruction resolves paths relative to this. It keeps your container organized and your COPY commands predictable. Here is the most important ordering insight most developers miss: COPY package*.json ./ RUN npm install --production COPY . . Why copy package.json first, install, then copy the rest of the code? Because of Docker's layer cache. 🧠 Docker caches each instruction as a layer. If a layer's inputs have not changed, it reuses the cache and skips execution. Dependencies (package.json) change rarely. Code changes constantly. By copying them separately, you ensure that npm install only reruns when your dependencies actually change. Swap the order and you reinstall node_modules on every single code change. On a large project, that is minutes wasted per build. HEALTHCHECK CMD curl -fs http://localhost:$PORT || exit 1 This is not for your benefit. It is for Kubernetes. Orchestrators use health checks to decide whether to route traffic to a container. A container that starts but serves errors is worse than one that never starts. USER node Drop root privileges before the process starts. A container running as root with a vulnerability can escape to the host. This line costs nothing. Skipping it costs potentially everything. The Dockerfile is not boilerplate. Every line is architecture. What is the most counterintuitive Dockerfile practice you have come across? #Docker #Dockerfile #DevOps #SoftwareEngineering #Containers #BackendDevelopment #CloudNative #ContinuousDelivery #Security
Like Comment
To view or add a comment, sign in
Akash Gautam
6d
Report this post
We had a simple problem. Or at least, it looked simple. 𝗧𝗵𝗲 𝗰𝗼𝗱𝗲 𝘄𝗮𝘀 𝘄𝗼𝗿𝗸𝗶𝗻𝗴 𝗽𝗲𝗿𝗳𝗲𝗰𝘁𝗹𝘆 𝗼𝗻 𝗺𝘆 𝗺𝗮𝗰𝗵𝗶𝗻𝗲. I pushed it. It broke in production. At first, we thought it was a bug. Then we checked logs. Then configs. Then dependencies. Hours passed. The issue? 𝗗𝗶𝗳𝗳𝗲𝗿𝗲𝗻𝘁 𝗲𝗻𝘃𝗶𝗿𝗼𝗻𝗺𝗲𝗻𝘁𝘀. On my machine: • Node version was slightly different • Some libraries were cached • Environment variables were set locally • OS behavior was slightly different In production: Everything was “correct.” But not the same. That’s when you realize something uncomfortable: - The problem is not your code. - The problem is your environment. This is the problem Docker solves. Docker doesn’t just run your application. It packages: • Your code • Your runtime • Your dependencies • Your system libraries • Your configurations Into a container. So instead of saying: “It works on my machine” You say: “It runs exactly the same everywhere.” Now development, testing, and production all use the same environment. No hidden differences. No silent mismatches. 𝗕𝘂𝘁 𝗵𝗲𝗿𝗲’𝘀 𝘁𝗵𝗲 𝗱𝗲𝗲𝗽𝗲𝗿 𝗶𝗻𝘀𝗶𝗴𝗵𝘁: Docker is not just about containers. It’s about removing uncertainty. Before Docker: Environment = unpredictable variable After Docker: Environment = controlled input That changes how systems are built. You can: • Spin up environments instantly • Scale services consistently • Deploy without surprises • Isolate services cleanly • Reproduce bugs exactly And most importantly: You stop debugging “why is this different?” And start focusing on actual problems. Docker didn’t just fix deployments. It fixed trust between environments. Because in real systems: 𝗖𝗼𝗻𝘀𝗶𝘀𝘁𝗲𝗻𝗰𝘆 𝗶𝘀 𝗺𝗼𝗿𝗲 𝘃𝗮𝗹𝘂𝗮𝗯𝗹𝗲 𝘁𝗵𝗮𝗻 𝘀𝗽𝗲𝗲𝗱. #Docker #DevOps #BackendEngineering #SystemDesign #SoftwareEngineering #AkashGautam
Like Comment
To view or add a comment, sign in
Sameer Dash
1mo
Report this post
Most developers treat Dockerfiles as packaging scripts. But they’re actually architecture decisions. Every unnecessary megabyte affects deployment speed, CI/CD runtime, Kubernetes scaling behavior, registry bandwidth usage, cold-start latency, and even the security surface of your service. Here’s what consistently makes the biggest difference. Choose the right base image This is usually the fastest win. Switching from full OS images to Alpine, slim, distroless, or newer minimal runtimes like Chainguard/Wolfi can shrink containers dramatically without touching application logic. One rule I now follow consistently: Dev image ≠ runtime image Use full images for debugging. Use minimal images for deployment. Structure Docker layers intentionally Docker caching becomes extremely effective when the Dockerfile is structured correctly. Dependencies change less frequently than application code. Installing dependencies before copying source code reduces rebuild time significantly during development and CI runs. Use .dockerignore properly Large build contexts quietly slow pipelines. Exclude things like node_modules, logs, git history, tests, and environment files. This improves build speed and helps prevent accidental secret exposure inside images. Combine commands to avoid hidden image bloat Each RUN instruction creates a layer. Deleting files later does not remove them from earlier layers — they still exist in image history. Combining install and cleanup steps inside the same layer keeps images smaller and reduces risk. Multi-stage builds make the biggest difference Separate build environment from runtime environment. Compile in one stage. Ship only artifacts in another. Most applications don’t need compilers, package managers, or source code inside the final container. This is usually where image size drops from hundreds of MB to tens of MB. Distroless images improve production posture Distroless containers remove shells, package managers, and unnecessary OS utilities entirely. The result is smaller images, faster startup time, fewer CVEs, and more predictable runtime behavior. Especially useful for services that don’t require interactive debugging in production. Use tooling that reveals what Docker hides Two tools that helped me go further: Dive helps inspect image layers visually. Docker Slim performs runtime-aware image minimization and reduces attack surface automatically. Container optimization looks like a small improvement at first. Until systems scale. Then it becomes a reliability multiplier. Sometimes the difference between something that just runs and something that runs efficiently in production is hidden inside a Dockerfile. #Docker #DevOps #Kubernetes #PlatformEngineering #SoftwareEngineering #CloudArchitecture #AIInfrastructure
Like Comment
To view or add a comment, sign in
Pavithra Ranasinghe
1mo
Report this post
A feature is not done when it works locally. Yes, Docker helped reduce a lot of the classic “it works on my machine” issues. But production readiness was never only about matching environments. It is also about how a feature behaves under load, during failures, with messy data, and when real users start depending on it. Local success usually proves only one thing: the happy path works in a controlled environment. But production is where the real test begins. A feature that works well on a developer machine can behave very differently when it meets: • real traffic and concurrency • slow or failing downstream services • unexpected production data • edge cases that never appeared in testing • limited visibility during incidents That is why, before calling a feature “done,” I try to think beyond implementation. I start asking questions like: • How does this behave under higher load? • What happens if a dependency times out or returns inconsistent data? • Can we trace the issue quickly in production? • Do we have the right logs, metrics, and alerts? • Can we release this safely and recover quickly if something goes wrong? For me, production readiness is not just about correct logic. It is also about resilience, observability, performance, and supportability. Because in real systems, “it worked locally” is only the starting point. The real goal is building something that continues to behave well under production reality. Working locally proves the code. Working in production proves the design. #SoftwareEngineering #BackendEngineering #SystemDesign #ProductionEngineering #Java #SpringBoot
Like Comment
To view or add a comment, sign in
Gajanan Deshmukh
6d
Report this post
Understanding Docker Compose – Image Flow Made Simple Ever wondered what happens behind the scenes when you run docker compose up? Here’s a simplified breakdown. 🔹 1. Define Services Everything starts with a docker-compose.yml file where you define services, images, networks, volumes, and environment variables. 🔹 2. Compose Reads Configuration Docker Compose reads the YAML file and understands how your application is structured. 🔹 3. Pull Images If images (from Docker Hub or other registries) are not available locally, they are pulled automatically. 🔹 4. Create Resources Compose sets up: Networks (for container communication) Volumes (for persistent storage) 🔹 5. Start Containers All defined services (like web, database, cache) are started as containers. 🔹 6. Application is Live 🎉 Containers communicate over the network, and your multi-service application runs seamlessly. 💡 Key Takeaway: With Docker + Docker Compose, you can manage complex multi-container applications with a single command — making development, testing, and deployment much easier. #Docker #DevOps #Microservices #SoftwareEngineering #Containerization
Like Comment
To view or add a comment, sign in
Dhyey Mavani
1w
Report this post
Your engineers are manually migrating legacy code. In 2026. Still. There's a better way, and it's already here. OpenAI just dropped a cookbook showing how to build a sandboxed code migration agent that: → Breaks a massive migration into task-sized shards → Runs edits and tests in an isolated environment → Returns a patch, report, and audit trail for every single change → Never touches your credentials or prod environment The architecture is what makes it elegant: Your orchestration logic stays on the HOST. The agent runs shell commands and file edits in a SANDBOX. The two never mix. No more "let's just merge the 3,000-line migration PR and pray." Each task produces a reviewable patch. Like a tiny, auditable PR, generated automatically. And here's the kicker: you can swap sandbox providers (Docker → E2B → Cloudflare) without touching a single line of agent code. This is what responsible AI-assisted engineering looks like. Not vibes-based autocomplete. A structured, testable, auditable workflow. The engineers who figure this out in the next 6 months are going to look like wizards to everyone else.
1 Comment
Like Comment
To view or add a comment, sign in
Roba Nath Basnet
2w
Report this post
400MB → 2GB. Six months. Zero code changes. CI: 3min → 20min. One Dockerfile. All of it. ANSWER: (D) Multi-stage builds Before (1.5–2GB): FROM node:22 # 1.1GB base COPY . . RUN npm install # ships dev deps too After (250–350MB): FROM node:22-alpine AS builder WORKDIR /app COPY package*.json . RUN npm ci --only=production COPY . . FROM node:22-alpine WORKDIR /app COPY --from=builder /app/node_modules . COPY --from=builder /app/src . CMD ["node", "src/app.js"] 75% smaller. One change. 5 FIXES RANKED BY IMPACT: 1️⃣ Multi-stage — 50–75% smaller 2️⃣ Alpine base — saves 900MB 3️⃣ Prod deps only — saves 300MB 4️⃣ .dockerignore — saves GBs 5️⃣ Layer ordering — faster CI DIAGNOSE: docker history myimage:latest docker run --rm -it wagoodman/dive myimage:latest Ship fast. Then fix the Dockerfile. Biggest image you inherited? 👇 #Docker #DevOps #CICD #30DaysOfDevOps
Like Comment
To view or add a comment, sign in
Saneej Ahamed
4w Edited
Report this post
The Claude Code Source Leak -> A DevOps Lesson 🔍 RCA using 5 Whys Technique: On 31st March 2026, Anthropic accidentally exposed 512,000 lines of their proprietary Claude Code source code via a misconfigured npm package. No hack. No malicious actor. Just a preventable human error. Let's break it down using the 5 Whys technique 👇 ❓ Problem: Source code was publicly exposed on npm Why 1: A 59.8 MB debug map file was accidentally bundled into the published package Why 2: The .npmignore file wasn't configured to exclude .map files Why 3: The Bun bundler generates source maps by default - and no one explicitly disabled it for production Why 4: There was no automated pre-publish check in the CI/CD pipeline to validate package contents before release Why 5 (Root Cause): No release governance policy existed that enforced security validation before every publish ✅ Preventive Actions: 🔹 Always run npm pack --dry-run before publishing - it shows exactly what goes into your package 🔹 Use an explicit files allowlist in package.json instead of a blocklist -only publish what you intend 🔹 Add a CI/CD pipeline gate that rejects any release containing .map files, source directories, or unexpected large files 🔹 Enforce a build policy: source maps are always disabled in production -make it a non-negotiable pipeline rule 🔹 Apply strict access controls on cloud storage - no proprietary assets should ever be publicly accessible The irony? Anthropic had built an entire "Undercover Mode" to prevent internal information leaking - and shipped the whole source code through a missing .npmignore entry. Security is only as strong as your weakest pipeline step. Has the team audited the npm publish configuration lately? 🤔 #DevOps #DevSecOps #CI_CD #Anthropic #ClaudeCode #ReleaseEngineering #LessonsLearned #5Whys #RootCauseAnalysis
1 Comment
Like Comment
To view or add a comment, sign in
Neha Chadha
4w Edited
Report this post
Saga felt like the final boss of microservices. Until it turned into… chaos. ❗ The Problem Our order flow looked simple on paper: A → B → C → ✅ Production reality: A → B → C → ❌ (The dreaded rollback loop) What actually happened: • If C failed, we had to “undo” A and B manually • Compensating logic became 80% of our code • One tiny bug → permanent data mismatch • Adding one service = multiple new failure paths 🧩 The Root Cause We stretched Saga beyond its limits. 5+ services whispering via events → no clear view of the system Business logic got buried under a mountain of error-handling 🛠️ The Fix We stopped chaining services blindly and moved to orchestration Before: A → B → C → D (choreography chaos) After: 🧠 Orchestrator ├── A ├── B ├── C └── D The impact: • One source of truth for the entire flow • Built-in retries (no custom retry loops) • Clear separation of concerns • Services focus on logic, not failure handling 📌 Key Learning • Saga works well for simple or well-bounded flows • If your “undo” code is bigger than your feature code, your architecture is telling you something ⚡ Microservices don’t fail because of scale. They fail because of unmanaged complexity. 💬 Are you still coding manual rollbacks… or letting an orchestrator handle it? 👇 #SystemDesign #Backend #Microservices #SoftwareArchitecture #Java #SpringBoot
1 Comment
Like Comment
To view or add a comment, sign in
Mahmoud Ezzat
4w
Report this post
🚨 Configuration drift is one of the most expensive "invisible" failures in modern CI/CD pipelines. A release looks flawless in dev and staging, but production breaks simply because one environment variable, secret, or Kubernetes ConfigMap key is out of sync. I built EnvSync to solve exactly that. EnvSync is a Python-based CLI tool designed to catch configuration inconsistencies before they reach deployment. 🚀 What EnvSync actually does: • Compares .env files and Kubernetes manifests across environments. • Detects missing keys, extra keys, and value mismatches instantly. • Safely handles ConfigMap and Secret drift (using SHA256 hashing to protect sensitive values without exposing them). • Integrates directly into CI/CD pipelines with a strict fail-on-drift gate. • Auto-discovers environment variables in your codebase to generate .env.template files. 💡 Why this matters for engineering teams: • Eliminates the need for manual config validation. • Drastically reduces deployment surprises and rollback cycles. • Promotes stronger system architecture hygiene and a highly reliable infrastructure. • Paves the way for better automation, optimization, and scalability. Built with Python 3.11+, Typer, PyYAML, and ready for GitHub Actions. 🔗 Check out the repository (and documentation) here: https://lnkd.in/dWen24aW #DevOps #PlatformEngineering #SRE #Python #CICD #Automation #Scalability #SystemArchitecture #Kubernetes

4 Comments
Like Comment
To view or add a comment, sign in

1,877 followers

157 Posts

View Profile Follow

Debugging Docker Containers in Production with Docker Logs

More Relevant Posts

Explore content categories