Automation fails without proper environment setup

My pipeline encountered a failure before it even began. The code was correct, and the YAML was configured properly, but I overlooked something entirely different. I developed a CI/CD quality gate for LevelUp Bank, which automatically blocks any pull request to the main branch if the README.md or .gitignore files are missing. Each merge generates a structured JSON audit log sent directly to AWS CloudWatch, organized into beta and prod log groups. Unit tests are executed first, ensuring that nothing is logged until the tool itself is verified. However, when I triggered the beta workflow for the first time, it failed immediately due to a single line of error: the beta environment did not exist in the repository settings. It wasn't broken code or a misconfigured secret; it was simply a settings page I had never accessed. After navigating to Settings, then Environments, I created the beta and prod environments and re-ran the workflow, which passed in seconds. This experience taught me an important lesson: the best automation fails without the necessary environment in place. It's crucial to build the code and then verify everything the code requires to function correctly. These are two distinct checklists, and I had only completed one. The full code and setup guide is available on GitHub; the link is in the first comment. What is your best "it was not even the code" moment? Share below. #DevOps #GitHubActions #AWS #CloudWatch #PlatformEngineering #CICD #Python #LearningInPublic #CloudEngineering #SoftwareEngineering #LevelUpInTech #TechCommunity

To view or add a comment, sign in

More Relevant Posts

Mohammed Y.
2w
Report this post
Just built a fully automated CI/CD pipeline from scratch - no clicks, no manual deploys. 🚀 Every push to main now: ✅ Runs pytest automatically ✅ Builds a Docker image ✅ Pushes to Docker Hub (tagged with commit SHA for traceability) ✅ Deploys to the cloud via webhook Broken code never reaches production - the deploy job is gated behind the test job, so if tests fail, nothing ships. Stack: FastAPI · Docker · GitHub Actions · Docker Hub · Render The part that surprised me most was how much there is to configure across multiple platforms - GitHub secrets, Docker access tokens, Render webhooks, CORS - before it all clicks into place and just works. Live Endpoint: https://lnkd.in/egqPR-it GitHub: https://lnkd.in/eq-bTeKr #Python #Docker #DevOps #GitHub #GitHubActions #CI #CD #SoftwareEngineering #100DaysOfCode
Like Comment
To view or add a comment, sign in
Sneha Harurkar
4d
Report this post
Day 85/100 – Environment Variables 🌱 Today was a small concept… but a big mindset shift. Earlier, I used to directly write things like database URLs, API keys, and configs inside my code. It worked… until I realized how risky and messy that is. Today I learned: Why we should never hardcode sensitive values How environment variables help keep things secure and flexible Practical: Created a .env file Moved configs out of code Also practiced: Solved “Valid Parentheses” using stack Now my code feels cleaner… and more production-ready. Key realization: Good developers don’t just make things work — they make them safe and maintainable. #BackendDevelopment #EnvironmentVariables #DevOps #DSA #LearningInPublic #100DaysOfCode
Like Comment
To view or add a comment, sign in
Hassan Abdi Mahat
6d
Report this post
Just completed a hands-on CI/CD project where I built a Python log analyzer and automated the full deployment workflow using GitHub Actions, Docker, Docker Hub, and AWS EC2. What stood out to me most about CI/CD is how it shifts teams from reactive debugging to proactive quality control. Before this project, I understood CI/CD conceptually. After building it, I saw how valuable it is when: ✅ Linting catches formatting issues early ✅ Unit tests prevent broken logic from being deployed ✅ Docker ensures consistency across environments ✅ Automated deployment removes repetitive manual work I also learned that real DevOps work is often debugging small issues that break pipelines: missing requirements.txt import errors GitHub workflow issues Docker deployment problems CI/CD isn’t just automation—it builds confidence that your code can move safely from development to production. Big thanks to @CoderCo for helping make these concepts practical through hands-on learning. #DevOps #CICD #GitHubActions #Docker #AWS #Python #CloudComputing #Automation
Like Comment
To view or add a comment, sign in
AnilCloudLab

69 followers
3w
Report this post
Reposting this because every DevOps engineer needs to read it. This is exactly what real pipelines look like at 3AM. 🔥 Follow @[your page name] for more real-world DevOps content.

AnilCloudLab

69 followers
3w

🚨 3AM. Pipeline blocked. Deploy frozen. Client breathing down our neck. Here's exactly what happened — and how we got out. Everything was green. ✅ Docker build — passed ✅ GitHub Actions — passed ✅ ArgoCD sync — waiting… Then this: ❌ SonarQube Quality Gate — FAILED No error message. Just a red gate and a frozen pipeline. Most people panic here. 😰 We didn't. We traced it. 🔍 Step 1 — Opened SonarQube dashboard Found it. Code coverage dropped to 41%. Threshold was 80%. Gate auto-blocked the deploy. 🔍 Step 2 — Traced WHY coverage dropped A dev pushed 3 new microservice files. Zero unit tests written. Not even a placeholder. 🔍 Step 3 — Checked the Trivy scan layered on top Found a CRITICAL CVE in one base image. base: python:3.9-slim → had a known vulnerability. Had to swap to python:3.11-slim immediately. 🔍 Step 4 — Fixed both. Pushed. Watched the gate. ✅ Coverage → 83% ✅ CVE → resolved ✅ Quality Gate → PASSED ✅ ArgoCD → synced to EKS ✅ Deploy live — 4:17AM Total time: 1 hour 12 minutes. This is NOT something you learn from YouTube. 🎥 You learn it by breaking things. By staring at logs at 3AM. By understanding WHY the gate blocked — not just clicking retry. This is EXACTLY what we simulate in AniCloudLab. 💡 Real pipeline. Real SonarQube. Real Trivy. Real EKS. Not a fake demo. A production-like environment where YOU debug. ✅ CI/CD — GitHub Actions + ArgoCD ✅ Security — SonarQube + Trivy ✅ Monitoring — Prometheus + Grafana ✅ Infrastructure — Terraform + AWS EKS Because when YOUR interviewer asks — "Tell me about a time a security scan blocked your deploy." You won't go silent. 🫤 You'll walk them through it. Step. By. Step. 🗓️ April 2026 Batch — Limited slots 🌏 IN | US | AU Time Zones 📲 WhatsApp: +91 7993 822600 🌐 https://lnkd.in/g5M4zhcK #DevOps #SonarQube #Kubernetes #AWSEKS #DevSecOps #CICDPipeline #CloudEngineering #DevOps2026 #AniCloudLab #TechCareers
Like Comment
To view or add a comment, sign in
Raymond Andrew Rizzo
4d
Report this post
The playbook said "FAILED." That was it. No explanation. No error message. Just "FAILED." 🔧 Happy Ansible Tuesday! I had a playbook that ran a custom module against a fleet of servers. On most hosts it worked fine. On three hosts it just said "FAILED" with no useful output. The msg field was empty. The stderr field was empty. The task just died and moved on. I spent way too long staring at the playbook logic, checking inventory variables, and comparing the failing hosts to the working ones. Everything looked identical. Same OS. Same Python version. Same module code. No reason for three hosts to fail and the rest to succeed. Then I added three letters: ansible-playbook site.yml -vvv The raw module output appeared. On the failing hosts, the module was throwing a Python traceback that Ansible's default output was swallowing. A missing Python dependency on those three hosts was causing an ImportError. The module crashed before it could format a proper error response, so Ansible had nothing to display except "FAILED." Three v's. That's all it took. The default output hid the problem. The verbose output showed me the exact exception, the exact line, and the exact missing package. Five minutes to fix after that. The Danger Zone (When Default Output Hides the Problem): 🔹 Ansible's default verbosity shows you what happened (pass/fail) but not always why. If a module crashes before it can return structured output, you get "FAILED" with no context. 🔹 -v adds task results. -vv adds input parameters. -vvv adds connection details and raw module output. -vvvv adds the full SSH/connection debug. Start with -vvv for most debugging. 🔹 If three hosts fail and 200 succeed with the same playbook, the problem is almost never the playbook. It's the host environment. -vvv shows you what the host gave back, not just what Ansible tried to do. ❓ Question of the Day: Which flag increases the verbosity of Ansible output to help debug connection issues? Ⓐ -d Ⓑ --debug Ⓒ --trace Ⓓ -vvv 👇 Answer and breakdown in the comments! #Ansible #NetworkAutomation #DevOps #DamnitRay #QOTD
1 Comment
Like Comment
To view or add a comment, sign in
Utkarsh Negi
2w
Report this post
🚀 Excited to share my latest project: Automated AWS ECS Deployment using Python & CI/CD! I built a Python-based automation script using Boto3 to trigger and manage deployments on AWS ECS, fully integrated within a Jenkins CI/CD pipeline. 🔧 Key Highlights: • Automated ECS service deployment using Python (Boto3) • Integrated deployment step within Jenkins pipeline • Used Docker & AWS ECR for container management • Debugged real-world issues like environment setup, path errors, and credential handling • Improved deployment reliability by removing manual steps 💡 This project helped me understand how real-world CI/CD systems handle deployment automation and infrastructure interaction. 📂 GitHub Repo: https://lnkd.in/g5sCeaQk I’m currently exploring GitLab CI/CD and Terraform to further strengthen my DevOps skills 🚀 #DevOps #AWS #Jenkins #Docker #Python #Boto3 #CICD #CloudComputing #Automation
Like Comment
To view or add a comment, sign in
Aftab B.
1w Edited
Report this post
A pod crashes at 3am. Kubernetes restarts it. It crashes again. Nobody reads the logs. Nobody fixes anything. The pod just keeps crashing until someone wakes up and opens a terminal. I got tired of being that person. So I built something. It is a .NET background service that watches your Kubernetes cluster. When a pod fails, it pulls the logs, sends them to Claude, and opens a GitHub PR with a root cause analysis and a suggested fix. Automatically. While you sleep. I tested it with a pod that intentionally fails to connect to a database. Within 10 seconds of the crash, Claude had written this in a PR: "The application failed to establish a connection to PostgreSQL at postgres://db:5432. Verify the service named 'db' exists and is running. Check network connectivity between pods. Verify the connection string in your environment variables." And then it generated the Kubernetes Service YAML to fix it. The whole thing is open source in .NET 10. If your team still gets woken up by Slack alerts at 3am, this might be worth a look. Full write-up and source code in the comments. #cluade #dotnet #kubernetes #devops #ai #csharp
2 Comments
Like Comment
To view or add a comment, sign in
Devanshu Biswas
3w
Report this post
Day 12 - "terraform apply" — two words, and your entire infrastructure exists. No clicking. No manual setup. Just code and intent. 🚀TechFromZero Series - TerraformFromZero This isn't a Hello World. It's real infrastructure provisioned entirely from code: 📐 .tf Files → terraform plan → Docker Provider → Network + Nginx + PostgreSQL (all running, all from code) 🔗 The full code (with step-by-step commits you can follow): https://lnkd.in/d63bkBJv 🧱 What I built (step by step): 1️⃣ Project scaffold — terraform init with Docker provider, HCL basics 2️⃣ First resource — Pull Nginx image and run container on localhost:8080 3️⃣ Variables — Make ports, names, passwords configurable with type validation 4️⃣ Outputs — Display container URLs and IDs after every apply 5️⃣ Docker network — Isolated network so containers talk to each other 6️⃣ PostgreSQL — Database container with persistent volume for real data 7️⃣ Data sources — Query existing Docker infrastructure at plan time 8️⃣ Provisioners — Health check script + database seed via local-exec 9️⃣ Modules — Extract PostgreSQL into a reusable, parameterized module 🔟 Workspaces — Run dev and staging environments side by side, same code 1️⃣1️⃣ Lifecycle rules — create_before_destroy, prevent_destroy, ignore_changes 💡 Every .tf file has detailed comments explaining WHY, not just what. Written for any beginner who wants to learn Terraform by reading real infrastructure code — with full clarity on each step. 👉 If you're a beginner learning Terraform, clone it and read the commits one by one. Each commit = one concept. Each file = one lesson. Built from scratch, so nothing is hidden. 🔥 This is Day 12 of a 50-day series. A new technology every day. Follow along! 🌐 See all days: https://lnkd.in/dhDN6Z3F #TechFromZero #Day12 #Terraform #IaC #InfrastructureAsCode #Docker #HCL #DevOps #LearnByDoing #OpenSource #BeginnerGuide #100DaysOfCode #CodingFromScratch
Like Comment
To view or add a comment, sign in
Antony Lejo Sesaian
3w
Report this post
𝐘𝐨𝐮𝐫 𝐂𝐨𝐧𝐭𝐚𝐢𝐧𝐞𝐫 𝐢𝐬 𝐁𝐫𝐨𝐤𝐞𝐧. 𝐇𝐨𝐰 𝐃𝐨 𝐘𝐨𝐮 𝐋𝐨𝐨𝐤 𝐈𝐧𝐬𝐢𝐝𝐞? [Docker Deep Dive — Day 5/5] It is a very simple statement that is used to inspect inside a running container. But this is an interviewer's favourite question: Read once, do it once. You will never forget it. Your container is misbehaving in production. Logs tell you nothing. You need to board the ship and inspect it yourself. This is exactly what exec is for. A running container is a ship at sea — isolated, self-contained, fully operational. From the shore you can only watch it. But with docker exec -it, you drop a ladder through the hatch and step inside — a live shell, inside the running environment, while it keeps sailing. -i keeps the connection open. -t gives you a proper terminal. Together they hand you the wheel room of a live vessel. bash # Docker docker exec -it <container_id> /bin/bash # Kubernetes kubectl exec -it <pod_name> -- /bin/bash # Check logs inside cat /app/logs/error.log # Check running processes ps aux 𝐅𝐀𝐐: Q: What is the difference between docker exec and kubectl exec? Same idea, different fleet. docker exec boards a standalone container. kubectl exec boards a pod inside a Kubernetes cluster. Both drop you into a live shell on a moving ship. Q: Does exec change the container permanently? No. Any changes you make inside vanish when the container stops. You are boarding the ship, not rebuilding it. For permanent changes, update the Dockerfile and rebuild the image. Q: When would you NOT use exec in production? When your containers are ephemeral and immutable by design. Best practice is to fix the Dockerfile, redeploy, and read external logs — not board a live ship mid-voyage. Exec is a debugging tool, not a deployment strategy. Q: What if bash is not available inside the container? Minimal images like Alpine do not ship bash. Use /bin/sh instead — a lighter shell that is almost always present. Next series: Kubernetes Architecture — the control plane, data plane, and why the API server is the heartbeat of your entire cluster. #DevOps #Docker #Containers #kubectl #DevOpsInterview #CloudEngineering #DockerDeepDive #Kubernetes
Like Comment
To view or add a comment, sign in
Muhammad Adnan
3w
Report this post
Day 5 of learning Cloud ML in public. Today I set up a CI/CD pipeline. Now when I type git push my API automatically updates on a cloud server in Mumbai. No SSH. No manual steps. No "oops I forgot to restart the container." Just push. Done. Here's exactly what the pipeline does (in 8 seconds): → GitHub detects a push to main → GitHub Actions spins up a runner → SSH into my EC2 instance → Force-pull the latest code (git reset --hard) → Stop old Docker container → Build new image → Run new container API updated. Automatically. Every. Single. Time. But nothing works perfectly on the first try. The pipeline showed all green checkmarks. I tested the API — old response. Still version 1. Spent 30 minutes debugging: docker ps → container running ✓ docker logs ml-container → no errors ✓ cat main.py → wait... The code on the server hadn't changed. git pull was silently failing because EC2 had no git repo initialized. Pipeline "succeeded" but skipped the actual work. Fix? One line: git reset --hard origin/main Force overwrite. No mercy for merge conflicts. This is the production-grade way. Lesson: Green checkmarks ≠ correct deployment. Always test the actual output. Week 1 is complete. Here's what 5 days built: Day 1 → AWS S3 + boto3 — Python talking to the cloud Day 2 → EC2 + IAM Roles — my own cloud server Day 3 → FastAPI — live ML API on the internet Day 4 → Docker — containerized everything Day 5 → CI/CD — automated deployment pipeline The most important thing I learned this week: It's not S3. It's not Docker. It's not even CI/CD. It's that every error is a teacher. Every broken deployment is a rep. The only way to get good at this is to actually do it — break things, fix them, and document what you learned. #AWS #CI/CD #GitHub #Docker #MachineLearning #CloudEngineering #LearningInPublic #Python #MLEngineering #DevOps

2 Comments
Like Comment
To view or add a comment, sign in

775 followers

37 Posts

View Profile Follow

Automation fails without proper environment setup

More Relevant Posts

Explore content categories