Kubernetes Job Failed Due to Resource Problem

𝗘𝘃𝗲𝗿𝘆𝘁𝗵𝗶𝗻𝗴 𝘄𝗮𝘀 𝗽𝗲𝗿𝗳𝗲𝗰𝘁. 𝗬𝗔𝗠𝗟, 𝗰𝗼𝗻𝗳𝗶𝗴, 𝗰𝗼𝗻𝗻𝗲𝗰𝘁𝗶𝗼𝗻… 𝗮𝗻𝗱 𝘀𝘁𝗶𝗹𝗹, 𝗶𝘁 𝗳𝗮𝗶𝗹𝗲𝗱. I was working on a DB migration job in Kubernetes for my EasyShop project. Everything looked clean and production-ready. 𝗖𝗼𝗻𝗳𝗶𝗱𝗲𝗻𝘁 𝘀𝗲𝘁𝘂𝗽: • ConfigMaps and Secrets • MongoDB via a Service • A well-structured Kubernetes Job But the job kept failing. Retries were exhausted. No obvious issue. Then I checked the pod status. 𝗢𝗢𝗠𝗞𝗶𝗹𝗹𝗲𝗱 That one word changed everything. This was not a code issue. This was a resource problem. The Node.js + TypeScript migration was consuming more memory than expected, while limits were set to just 256Mi. Kubernetes did exactly what it is supposed to do. It killed the container to protect the node. 𝗪𝗵𝗮𝘁 𝗜 𝗳𝗶𝘅𝗲𝗱: • Increased the memory limit to 𝟭𝗚𝗶 • Tuned resource requests • Controlled Node.js memory with NODE_OPTIONS="--max-old-space-size=768" 𝗥𝗲𝘀𝘂𝗹𝘁: The job ran successfully. No retries. No failures. 𝗞𝗲𝘆 𝗹𝗲𝗮𝗿𝗻𝗶𝗻𝗴: In Kubernetes, stability is not just about correct YAML or working code. 𝗥𝗲𝘀𝗼𝘂𝗿𝗰𝗲 𝗽𝗹𝗮𝗻𝗻𝗶𝗻𝗴 𝗶𝘀 𝗲𝗾𝘂𝗮𝗹𝗹𝘆 𝗰𝗿𝗶𝘁𝗶𝗰𝗮𝗹. 𝗪𝗵𝗮𝘁 𝗜 a𝗺 𝗱𝗼𝗶𝗻𝗴 𝗻𝗲𝘅𝘁: Moving this migration into an Init Container to make deployments more reliable and automated. Adding proper resource monitoring and alerts to catch memory issues early. Exploring Horizontal Pod Autoscaling and better resource profiling to prevent similar bottlenecks in the future. #Kubernetes #DevOps #CloudComputing #NodeJS #TypeScript #Docker #Containers #SRE #PlatformEngineering #BackendDevelopment #Microservices #Debugging #TechLearning #EngineeringLife #OpenToWork

  • text

Great learning Ankit Srivastav .How would you prevent this in future ?

Like
Reply

Great learning Ankit Srivastav .How would you prevent this in future ?

Like
Reply
See more comments

To view or add a comment, sign in

Explore content categories