Kubernetes Cheat Sheet for DevOps Interviews

🚀 **Kubernetes (K8s) – Complete Concepts Cheat Sheet (Quick Revision)** If you're preparing for DevOps / Kubernetes interviews or want a quick refresher, here’s a **one-line guide to all major K8s concepts** 👇 --- 🧱 **Core Concepts** • Cluster – Group of nodes running containerized apps • Node – Machine where pods run • Control Plane – Manages cluster state • Pod – Smallest deployable unit • Container – Application runtime --- 🚀 **Workloads** • ReplicaSet – Maintains number of pods • Deployment – Handles updates & scaling • StatefulSet – For stateful applications • DaemonSet – One pod per node • Job – Runs a task once • CronJob – Runs scheduled jobs --- 🌐 **Networking** • Service – Exposes pods • ClusterIP – Internal access • NodePort – Access via node IP • LoadBalancer – External access • Ingress – HTTP/HTTPS routing • CoreDNS – Service discovery --- 💾 **Storage** • Volume – Pod storage • PV – Persistent storage • PVC – Storage request • StorageClass – Dynamic provisioning --- 🔐 **Security & Config** • ConfigMap – Non-sensitive data • Secret – Sensitive data • Namespace – Logical isolation • RBAC – Access control • ServiceAccount – Pod identity • NetworkPolicy – Traffic control --- ⚙️ **Scheduling** • Scheduler – Assigns pods to nodes • NodeSelector – Basic node selection • Node Affinity – Advanced rules • Pod Affinity – Pod placement rules • Taints & Tolerations – Restrict scheduling • Resource Limits/Requests – CPU & memory control --- 🔍 **Monitoring & Debugging** • kubectl – CLI tool • Logs – Container output • Events – Cluster activity • Liveness Probe – Health check • Readiness Probe – Traffic readiness • Startup Probe – App startup check --- 🔄 **Scaling & Updates** • HPA – Auto-scale pods • VPA – Adjust resources • Rolling Update – Zero downtime updates • Rollback – Revert changes --- 🧰 **Advanced** • Helm – Package manager • CRD – Extend Kubernetes API • Operator – Automation for apps • ETCD – Cluster database • API Server – Entry point • Kubelet – Node agent • Kube-Proxy – Networking --- 💡 **Quick Summary:** Kubernetes helps manage, scale, and automate containerized applications efficiently. --- #Kubernetes #DevOps #CloudComputing #Docker #K8s #Learning #Tech #SRE

To view or add a comment, sign in

More Relevant Posts

Suganya Rani Balasundaram
1mo
Report this post
𝗠𝗼𝘃𝗶𝗻𝗴 𝗳𝗿𝗼𝗺 𝗔𝗜 𝗦𝘁𝗿𝗮𝘁𝗲𝗴𝘆 𝘁𝗼 𝗔𝗴𝗲𝗻𝘁𝗶𝗰 𝗜𝗻𝗳𝗿𝗮𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲 🚀 I’ve always believed that the best AI Product Managers are "Strategic Builders" - leaders who don't just point at a roadmap but understand the "plumbing" that makes autonomous systems possible. I just wrapped up an end-to-end deployment of the Mini Finance web application, moving from a blank terminal to a live public cloud environment using a full DevOps stack. 𝗧𝗵𝗲 𝗧𝗲𝗰𝗵 𝗦𝘁𝗮𝗰𝗸: 🔹 𝗧𝗲𝗿𝗿𝗮𝗳𝗼𝗿𝗺: Provisioning Azure Virtual Machines, VNets, and Network Security Groups (Ports 22/80). 🔹 𝗔𝗻𝘀𝗶𝗯𝗹𝗲: Automating the "Install → Deploy → Verify" flow via a multi-playbook strategy. 🔹 𝗦𝗲𝗰𝘂𝗿𝗶𝘁𝘆: Implementing passwordless SSH and secure credential handling for GitHub. 𝗧𝗵𝗲 𝗖𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗲: During the Ansible run, my Git clone task hung indefinitely. After some -vvv debugging, I realized it was a "silent" authentication prompt from GitHub stall on the VM. 𝙏𝙝𝙚 𝙁𝙞𝙭: I pivoted to using a GitHub Personal Access Token (PAT) injected into the URL. The result? A 100% automated, "hands-off" deployment. This level of reliability is only possible by leveraging Ansible’s core architecture:" • 𝗔𝗴𝗲𝗻𝘁𝗹𝗲𝘀𝘀: Operates via standard SSH with no software required on target nodes. • 𝗬𝗔𝗠𝗟 𝗣𝗹𝗮𝘆𝗯𝗼𝗼𝗸𝘀: Uses human-readable, English-like syntax to define automation. • 𝗜𝗱𝗲𝗺𝗽𝗼𝘁𝗲𝗻𝘁: Only makes changes if the system isn't already in the desired state. • 𝗖𝗿𝗼𝘀𝘀-𝗣𝗹𝗮𝘁𝗳𝗼𝗿𝗺: Manages Linux, Windows, Cloud APIs, and Network gear seamlessly. • 𝗦𝗰𝗮𝗹𝗮𝗯𝗹𝗲: Effortlessly orchestrates one or thousands of servers simultaneously. This architecture is the 'secret sauce' for any Strategic Builder. It allows us to move from manual tinkering to industrial-scale automation in any domain. 𝗥𝗲𝗮𝗹-𝗪𝗼𝗿𝗹𝗱 𝗔𝗽𝗽𝗹𝗶𝗰𝗮𝘁𝗶𝗼𝗻 𝗘𝘅𝗮𝗺𝗽𝗹𝗲: 𝗥𝗲𝗮𝗹 𝗘𝘀𝘁𝗮𝘁𝗲 𝗗𝗮𝘁𝗮 𝗔𝗴𝗴𝗿𝗲𝗴𝗮𝘁𝗶𝗼𝗻 𝗣𝗶𝗽𝗲𝗹𝗶𝗻𝗲𝘀 𝗧𝗵𝗲 𝗣𝗿𝗼𝗯𝗹𝗲𝗺: Data sources change frequently. We need "Worker VMs" that spin up, scrape data, and shut down. 𝗧𝗵𝗲 𝗔𝗽𝗽𝗹𝗶𝗰𝗮𝘁𝗶𝗼𝗻: Use Terraform to provision high-memory Azure VMs and Ansible to automatically install Python, BeautifulSoup, or Playwright, and the specific API keys needed to fetch data from databases. 𝗧𝗵𝗲 𝗕𝗲𝗻𝗲𝗳𝗶𝘁: We can ensure our data collection is consistent every time, preventing "dirty data" from entering our AI model. Thanks to the community and our mentor Pravin Mishra, lead co-mentor Praveen Pandey and co-mentors Abhishek Makwana, Manish Kumar, Olajide Salami and Nkechi Anna Ahanonye for their support and technical guidance. P.S. This post is part of the DevOps Micro Internship (DMI) Cohort-2 by Pravin Mishra. You can start your DevOps journey by joining this Discord community - https://lnkd.in/gsQiNMTX #DevOps #Terraform #Ansible #Azure #ProductManagement #AIPM
1 Comment
Like Comment
To view or add a comment, sign in
Nishal K
2w
Report this post
Hi Linkies Day 20 – Debugging Containers: exec / logs / inspect / stats (Real-World Guide) Containers fail. Services crash. Applications behave unexpectedly. In those moments, dashboards are helpful — 👉 but real debugging starts with Docker CLI tools. Let’s break down the 4 essential commands every DevOps engineer must know 👇 1️⃣ docker logs – First Step Always Check what the application is saying. 📌 Example: docker logs <container-id> Follow logs in real-time: docker logs -f <container-id> 🔍 What to Look For: Errors / stack traces Startup failures Connection issues 🧠 Interview Insight: Logs should be your first checkpoint, not restart. 2️⃣ docker exec – Get Inside the Container Access the running container. 📌 Example: docker exec -it <container-id> /bin/sh 🔍 What You Can Do: Check running processes Verify configs Test connectivity Inspect files 🧠 Interview Insight: If app fails, validate environment from inside container. 3️⃣ docker inspect – Deep Metadata Provides detailed JSON output about container. 📌 Example: docker inspect <container-id> 🔍 Useful Info: IP address Mounted volumes Environment variables Network configuration Restart policy 🧠 Interview Insight: Use inspect when logs don’t give enough context. 4️⃣ docker stats – Resource Usage Monitor container performance in real-time. 📌 Example: docker stats 🔍 Shows: CPU usage Memory usage Network I/O Block I/O 🧠 Interview Insight: High CPU or memory usage → performance bottleneck. 🔥 Real Debugging Workflow When a container is failing: 1️⃣ Check logs docker logs 2️⃣ Enter container docker exec 3️⃣ Inspect configuration docker inspect 4️⃣ Check resource usage docker stats 🔍 Real Example Scenario App not responding: Logs → shows DB connection error Exec → check environment variables Inspect → confirm DB host Stats → verify not resource constrained 👉 Root cause identified quickly. 💬 Common Interview Questions How do you debug a failing container? What is the difference between logs and exec? What does docker inspect provide? How do you check container resource usage? How do you troubleshoot high CPU in containers? 📘 Helpful Resources https://lnkd.in/gf5aVmEJ https://lnkd.in/gb4eYpPH https://lnkd.in/gaDT3wEa https://lnkd.in/gQpKxa76 ✅ Always check logs before restarting ✅ Avoid docker exec in production unless necessary ✅ Use centralized logging (ELK, CloudWatch) ✅ Monitor containers continuously ✅ Capture logs before container crashes Debugging is not about guessing. It’s about observing the system step by step. #Docker #DevOps #SRE #ContainerDebugging #CloudEngineering #Kubernetes #Troubleshooting #Infrastructure #PlatformEngineering #LearnDevOps #TechLearning
Like Comment
To view or add a comment, sign in
Suganya Rani Balasundaram
4w
Report this post
𝗠𝗮𝘀𝘁𝗲𝗿𝗶𝗻𝗴 𝗜𝗱𝗲𝗺𝗽𝗼𝘁𝗲𝗻𝗰𝘆: 𝗦𝗰𝗮𝗹𝗶𝗻𝗴 𝘁𝗵𝗲 𝗙𝗶𝗻𝗮𝗹 𝗠𝗶𝗹𝗲 𝗼𝗳 𝗔𝗴𝗲𝗻𝘁𝗶𝗰 𝗔𝗜 𝗜𝗻𝗳𝗿𝗮𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲 I just wrapped up a deep dive into production-grade orchestration, moving a three-tier application from a "flaky" manual setup to a rock-solid, idempotent Ansible architecture. Transitioning from experimental prototypes to reliable, production-ready systems is where the real "magic" happens in AI product management. 𝗧𝗵𝗲 𝗪𝗶𝗻: 𝗥𝗲𝗹𝗶𝗮𝗯𝗶𝗹𝗶𝘁𝘆 𝗮𝘁 𝗦𝗰𝗮𝗹𝗲 I successfully architected a deployment pipeline for EpicBook (a full-stack Node.js/MySQL app) that achieves a "Perfect Zero" on subsequent runs. Whether it’s the first deployment on a fresh Azure instance or a routine update, the system now enforces the desired state without redundant restarts or data duplication. 𝗧𝗵𝗲 𝗖𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗲: 𝗦𝗼𝗹𝘃𝗶𝗻𝗴 𝘁𝗵𝗲 𝗥𝗮𝗰𝗲 𝗖𝗼𝗻𝗱𝗶𝘁𝗶𝗼𝗻 The biggest hurdle was a classic dependency race. The application tier would frequently crash because it tried to "wake up" before the database schema was fully seeded. In a multi-agent or complex app environment, timing is everything. 𝗧𝗵𝗲 𝗙𝗶𝘅: I re-engineered the orchestration into a Linear Dependency Chain. By consolidating roles into a single play and using meta: flush_handlers, I forced a strict sequence: Files → Data → Routing. Now, the app only breathes once the data layer is 100% ready. 𝗞𝗲𝘆 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴: 𝗜𝗱𝗲𝗺𝗽𝗼𝘁𝗲𝗻𝗰𝘆 𝗶𝘀 𝗮 𝗖𝗵𝗼𝗶𝗰𝗲 Infrastructure isn't just about writing code; it’s about managing state. I learned that true reliability requires explicit checks—like verifying a database table exists before attempting a seed. This "Check-then-Act" philosophy is exactly what’s needed to bridge the gap between traditional PM and AI-native engineering. 𝗦𝗲𝗰𝘂𝗿𝗶𝘁𝘆 𝗠𝗶𝗻𝗱𝘀𝗲𝘁 𝗳𝗼𝗿 𝗣𝗿𝗼𝗱𝘂𝗰𝘁𝗶𝗼𝗻 While the automation is smooth, the "Final Mile" requires hardening. My next steps involve: • 𝗭𝗲𝗿𝗼-𝗣𝗹𝗮𝗶𝗻𝘁𝗲𝘅𝘁: Moving secrets from config files into Ansible Vault. • 𝗡𝗲𝘁𝘄𝗼𝗿𝗸 𝗜𝘀𝗼𝗹𝗮𝘁𝗶𝗼𝗻: Moving the DB and App tiers into private subnets, accessible only via the Nginx proxy. • 𝗟𝗲𝗮𝘀𝘁 𝗣𝗿𝗶𝘃𝗶𝗹𝗲𝗴𝗲: Tightening sudo permissions to ensure the automation only touches what it must. Onward to more resilient systems! 🚀 Thanks to our mentor Pravin Mishra, lead co-mentor Praveen Pandey, and co-mentors Abhishek Makwana, Manish Kumar, Olajide Salami, and Nkechi Anna Ahanonye for the technical guidance and support throughout this build! P.S. This post is part of the DevOps Micro Internship (DMI) Cohort-2 by Pravin Mishra. You can start your DevOps journey by joining this Discord community - https://lnkd.in/gsQiNMTX
1 Comment
Like Comment
To view or add a comment, sign in
Vosur Manasa
1w Edited
Report this post
𝗞𝘂𝗯𝗲𝗿𝗻𝗲𝘁𝗲𝘀 𝗦𝗰𝗵𝗲𝗱𝘂𝗹𝗶𝗻𝗴 - Part 1 : “𝗡𝗼𝗱𝗲𝗦𝗲𝗹𝗲𝗰𝘁𝗼𝗿 𝘃𝘀 𝗡𝗼𝗱𝗲 𝗔𝗳𝗳𝗶𝗻𝗶𝘁𝘆 𝘃𝘀 𝗧𝗮𝗶𝗻𝘁𝘀 & 𝗧𝗼𝗹𝗲𝗿𝗮𝘁𝗶𝗼𝗻𝘀” When deploying pods in Kubernetes, a common question is: 👉 𝘏𝘰𝘸 𝘥𝘰𝘦𝘴 𝘒𝘶𝘣𝘦𝘳𝘯𝘦𝘵𝘦𝘴 𝘥𝘦𝘤𝘪𝘥𝘦 𝘸𝘩𝘦𝘳𝘦 𝘢 𝘱𝘰𝘥 𝘳𝘶𝘯𝘴? Here’s a simple breakdown 👇 --- 🔹 1. 𝗡𝗼𝗱𝗲𝗦𝗲𝗹𝗲𝗰𝘁𝗼𝗿 (𝗕𝗮𝘀𝗶𝗰 & 𝗦𝘁𝗿𝗮𝗶𝗴𝗵𝘁𝗳𝗼𝗿𝘄𝗮𝗿𝗱) The simplest way to control pod placement. ✔ Matches labels on nodes ✔ Pod runs only on nodes with those labels 𝗘𝘅𝗮𝗺𝗽𝗹𝗲: If a node has label 👉 `env=prod` Pod spec: `nodeSelector: env=prod` 🟢 Easy to use 🔴 Not flexible (no complex conditions) --- 🔹 𝟮. 𝗡𝗼𝗱𝗲 𝗔𝗳𝗳𝗶𝗻𝗶𝘁𝘆 (𝗔𝗱𝘃𝗮𝗻𝗰𝗲𝗱 𝗡𝗼𝗱𝗲 𝗦𝗲𝗹𝗲𝗰𝘁𝗶𝗼𝗻) An enhanced version of NodeSelector with more control. ✔ Supports expressions (In, NotIn, Exists, etc.) ✔ Allows "hard and soft rules" 𝙏𝙮𝙥𝙚𝙨: 1. 𝘙𝘦𝘲𝘶𝘪𝘳𝘦𝘥𝘋𝘶𝘳𝘪𝘯𝘨𝘚𝘤𝘩𝘦𝘥𝘶𝘭𝘪𝘯𝘨𝘐𝘨𝘯𝘰𝘳𝘦𝘥𝘋𝘶𝘳𝘪𝘯𝘨𝘌𝘹𝘦𝘤𝘶𝘵𝘪𝘰𝘯 (hard rule) → must match 2. 𝘗𝘳𝘦𝘧𝘦𝘳𝘳𝘦𝘥𝘋𝘶𝘳𝘪𝘯𝘨𝘚𝘤𝘩𝘦𝘥𝘶𝘭𝘪𝘯𝘨𝘐𝘨𝘯𝘰𝘳𝘦𝘥𝘋𝘶𝘳𝘪𝘯𝘨𝘌𝘹𝘦𝘤𝘶𝘵𝘪𝘰𝘯(soft rule) → scheduler will try, but not guaranteed 🟢 More flexible and production-friendly --- 🔹 𝟯. 𝗧𝗮𝗶𝗻𝘁𝘀 & 𝗧𝗼𝗹𝗲𝗿𝗮𝘁𝗶𝗼𝗻𝘀 (𝗡𝗼𝗱𝗲 𝗣𝗿𝗼𝘁𝗲𝗰𝘁𝗶𝗼𝗻 𝗠𝗲𝗰𝗵𝗮𝗻𝗶𝘀𝗺) Works in the opposite direction. 👉 Nodes control which pods are allowed to run. ✔ Nodes are “tainted” to repel pods ✔ Pods need matching “tolerations” to be scheduled 𝗘𝘅𝗮𝗺𝗽𝗹𝗲: * Node taint: `key=value:NoSchedule` * Only pods with 𝘮𝘢𝘵𝘤𝘩𝘪𝘯𝘨 𝘵𝘰𝘭𝘦𝘳𝘢𝘵𝘪𝘰𝘯 can run 𝗧𝗮𝗶𝗻𝘁 𝗘𝗳𝗳𝗲𝗰𝘁𝘀: • NoSchedule → new pods will not be scheduled • PreferNoSchedule → avoid placing pods if possible • NoExecute → evict existing pods + block new ones 🟢 𝗨𝘀𝗲𝗳𝘂𝗹 𝗳𝗼𝗿: * Dedicated nodes (GPU, DB workloads) * Isolating critical applications --- 🔥 𝗤𝘂𝗶𝗰𝗸 𝗔𝗻𝗮𝗹𝗼𝗴𝘆 1. NodeSelector → “Go to this exact place” 2. Node Affinity → “Require or prefer this type of place” 3. Taints & Tolerations → “Entry allowed only with permission” --- 💡 𝗪𝗵𝗲𝗻 𝘁𝗼 𝘂𝘀𝗲 𝘄𝗵𝗮𝘁? ✔ Simple requirement → NodeSelector ✔ Complex logic → Node Affinity ✔ Node isolation → Taints & Tolerations --- These concepts help in better workload placement and efficient cluster usage. #kubernetes #DevOps #cloud #containers #taints #tolerations #nodeselector #nodeaffinity
Like Comment
To view or add a comment, sign in
Gustavo Arantes
3w
Report this post
I just released dispatchd v0.1.0! It started as a deep dive into Go/distributed systems, and has now evolved into something I’m ready to share. It’s a distributed task orchestration platform designed to handle coordination, failure recovery, and provide observable system behavior at scale The current architecture consists of: - gRPC control plane + bidirectional worker streams - shared Postgres + Redis state for durable orchestration and low-latency coordination - scheduler leadership, retries, dead-lettering, and a distributed assignment flow - Kubernetes, Kustomize, GitOps w/ Argo CD - GitHub Actions CI/CD with DevSecOps checks built into the pipeline - Prometheus, Grafana, Jaeger, and published performance/reliability evidence (if you can't measure it, it didn't happen, right?) I’m intentionally keeping it at v0.1.0. This is the baseline, not the finish line. I’m focusing on tackling the hard questions: state coordination across services, partial failure recovery, and establishing security boundaries that I can actually defend with evidence The next steps are already planned: - enforced Zero-Trust security policies - active/Passive multi-region controls - resilience and disaster recovery (DR) drills. I’ve formalized the repo with SemVer, a security policy, and even some contribution guidelines, so if you're a distributed systems nerd like me, come join the fun! Check it out, break it, and let me know what you think. I'm all about building, learning, and (respectful) criticism. Repo: https://lnkd.in/dtsjhRXM (And if you like the project, a ⭐️ is always appreciated! Hehe) #Go #SoftwareEngineering #Kubernetes #DevSecOps #DistributedSystems #Backend

GitHub - gnix0/dispatchd: Distributed task orchestration platform in Go with gRPC github.com

9 Comments
Like Comment
To view or add a comment, sign in
Ohia Uche
3d
Report this post
Your CI/CD pipeline can be green and your application can still be completely broken. Here's what 37 hours of production monitoring taught me. I deployed a Kubernetes-based retail application on GKE using Terraform, Helm, and ArgoCD. Pipeline passed. ArgoCD showed synced. But something was off in Grafana, the error budget was bleeding out slowly, and I had no idea why. 𝗪𝗵𝗮𝘁 𝘁𝗵𝗲 𝗱𝗮𝘀𝗵𝗯𝗼𝗮𝗿𝗱𝘀 𝘀𝗵𝗼𝘄𝗲𝗱 The API server availability was sitting at 99.952% — above the SLO target — but the error budget graph told a different story. It had been declining steadily for hours. Not a spike. A slope. That kind of pattern means something is persistently failing, not crashing. Without monitoring I would have assumed everything was fine. The pipeline was green. ArgoCD said synced. But two pods had been silently failing for over an hour. 𝗧𝗵𝗲 𝗿𝗼𝗼𝘁 𝗰𝗮𝘂𝘀𝗲𝘀 (𝘁𝗵𝗲𝗿𝗲 𝘄𝗲𝗿𝗲 𝘁𝘄𝗼) 1. Image tag mismatch. The CI pipeline was pushing images tagged with the git commit SHA — backend:a3f9c12 — but the Helm values.yaml was set to pull :latest. That tag was never pushed. Kubernetes kept retrying, the error budget kept draining, and nothing surfaced it except Grafana. Lesson: if your CI tags images with commit SHAs but your manifests reference :latest, you have a silent gap. Always push both — or let Helm inject the SHA at deploy time and forget :latest entirely. 2. Missing IAM binding. The GKE node pool service account had roles/cloudsql.client but was missing roles/artifactregistry.reader. Even after fixing the tag, the nodes couldn't authenticate to pull from Google Artifact Registry. Terraform had the SA configured — just with an incomplete set of roles. 𝗛𝗼𝘄 𝗺𝗼𝗻𝗶𝘁𝗼𝗿𝗶𝗻𝗴 𝗰𝗼𝗻𝗳𝗶𝗿𝗺𝗲𝗱 𝘁𝗵𝗲 𝗳𝗶𝘅 After tagging the correct digest as :latest, restarting the deployments, and applying the IAM fix — the dashboards told the recovery story in real time. The error budget slope flattened. Read SLI errors dropped to 0%. Both pods went to 2/2 Running within seconds. ArgoCD: Synced · Healthy. No degraded apps. Zero restarts. 𝗪𝗵𝗮𝘁 𝗜'𝗱 𝘁𝗲𝗹𝗹 𝗮𝗻𝘆 𝗗𝗲𝘃𝗢𝗽𝘀 𝗲𝗻𝗴𝗶𝗻𝗲𝗲𝗿 A passing pipeline is not proof your application is running. Observability is what bridges that gap. If you're building on Kubernetes and you don't have Grafana dashboards watching your SLIs, error budgets, pod states, and node resources — you're flying blind between deployments. The stack I used: GKE + Terraform + GitHub Actions + Helm + ArgoCD + Prometheus + Grafana. Each layer has a job. Monitoring's job is to catch what all the others miss. #DevOps #Kubernetes #GKE #Observability #Grafana #GitOps #ArgoCD #Terraform #SRE #CloudEngineering
Like Comment
To view or add a comment, sign in
Sridhar Modalavalasa
2w Edited
Report this post
✨ We were managing a growing number of repositories and manual dependency reviews were killing us. ❇️ So we built something about it. 🚀 At Backbase, our SCM Platforms team scaled rapidly, and with that came a storm of challenges: ⚠️ Unpatched libraries exposing apps to supply chain attacks. ⚠️ Compliance mandates demanding continuous SBOM tracking. ⚠️ Manual dependency reviews eating up the security team's bandwidth. ⚠️ Commercial SCA tools with licensing costs that didn't scale. ❇️ 𝗧𝗵𝗲 𝘀𝗼𝗹𝘂𝘁𝗶𝗼𝗻? We deployed 𝗢𝗪𝗔𝗦𝗣 𝗗𝗲𝗽𝗲𝗻𝗱𝗲𝗻𝗰𝘆-𝗧𝗿𝗮𝗰𝗸 on 𝗔𝗺𝗮𝘇𝗼𝗻 𝗘𝗞𝗦 as our centralized Software Composition Analysis (SCA) platform fully automated through 𝗚𝗶𝘁𝗛𝘂𝗯 𝗔𝗰𝘁𝗶𝗼𝗻𝘀. 🔹 𝗦𝗕𝗢𝗠 𝗮𝘀 𝗮 "𝗻𝘂𝘁𝗿𝗶𝘁𝗶𝗼𝗻 𝗹𝗮𝗯𝗲𝗹" 𝗳𝗼𝗿 𝘀𝗼𝗳𝘁𝘄𝗮𝗿𝗲 - every component, library, and dependency inventoried using the CycloneDX standard 🔹 𝗠𝘂𝗹𝘁𝗶-𝘀𝘁𝗮𝗰𝗸 𝘀𝘂𝗽𝗽𝗼𝗿𝘁 - Java (Maven), Python, Node.js, Android, iOS, and Docker containers all covered with a single portable bash function 🔹 𝗧𝗵𝗿𝗲𝗲-𝘁𝗶𝗲𝗿 𝗚𝗶𝘁𝗛𝘂𝗯 𝗔𝗰𝘁𝗶𝗼𝗻𝘀 𝘄𝗼𝗿𝗸𝗳𝗹𝗼𝘄 - a reusable pattern that propagates updates across all repos automatically, cutting CI runtime 🔹 𝗦𝗵𝗶𝗳𝘁-𝗹𝗲𝗳𝘁 𝘀𝗲𝗰𝘂𝗿𝗶𝘁𝘆 - SBOMs are generated and analyzed on every PR, so devs see the security impact 𝗯𝗲𝗳𝗼𝗿𝗲 merging ❇️ ✨ 𝗧𝗵𝗲 𝗿𝗲𝘀𝘂𝗹𝘁? Continuous, automated vulnerability monitoring across our entire software supply chain without the overhead of commercial licensing. 💪 I wrote about the full architecture, Helm configs, JVM tuning, and pipeline design in detail in my Backbase official engineering blog website. 📖 Link: https://lnkd.in/g9_NmM-v 💬 How is your team tackling software supply chain security? Would love to hear your approach! ✨Thanks to my colleagues Aravind P R Anil Kumar Mamidi BARATAM KISHORKUMAR for constant support during this process. 🙌 🔥 ❇️ ✨Grateful to my leadership team for trusting us with the vision and giving us the space to innovate. 🙌 🔥 Chris Vanden Berghe, Pablo Lorenzo, Hariharan Anantharaman, Sumit Singhal, Akshay Pk, Chandra Venkata Atchut Garre, Aditya V, PMP®, SAFe Agilist, Jaydeep Bagare, Kiran Kumar Gudimalla, Hari Venkata Dileep Maddi

Dependency-Track | Backbase Engineering engineering.backbase.com

6 Comments
Like Comment
To view or add a comment, sign in
Talha Rehman
3w Edited
Report this post
So after merging… things didn’t just magically work from my last work post... ⚠️ First issue hit immediately — ArgoCD sync failed Turns out Helm was treating Grafana variables like {{user}} and {{tenant}} as Go template functions → boom “function not defined” errors → Fixed by escaping all legendFormat variables properly → Re-sync… 🚀 Dashboard came up 7 panels. Clean. Structured. Exactly as designed. But then… ❌ Alerts = completely missing Grafana UI showed zero rules So I went full debug mode — layer by layer: → ConfigMap exists? ✅ → Sidecar picked it up? ✅ (file present in provisioning path) → Then why not loading? 🤔 🔍 Root cause (hidden in logs): A single Teams contact point had a webhook env variable… which didn’t exist in the pod 👉 One failed validation blocked entire alert provisioning All 4 rules were silently dropped because of ONE bad config 🛠️ Fix: → Removed external contact point (test-safe approach) → Re-deployed 💥 All 4 alert rules loaded instantly 😅 But wait… still looked empty in UI Turned out… a dashboard filter was hiding standalone rules → Cleared filter → Everything visible ⚠️ Final issue — “No Data” on dashboards Traced it down to: → Mimir & Loki gateways unreachable from Grafana Not our config issue — networking side problem 👉 Meaning: Structure = solid Data = not flowing (yet) 🧠 Complete debugging flow looked like this: ArgoCD sync error → Helm escaping fix → Dashboard live → Alerts missing → Sidecar verified → Logs checked → Contact point failure found → Removed external notifications → Alerts loaded → UI filter fixed → Data connectivity traced 💡 Big realization: Each fix doesn’t solve the system… it just reveals the next hidden problem That’s real DevOps. 🚀 Final state (Test): • GitOps pipeline fully working (Git → ArgoCD → Helm → Grafana) • Dashboard (7 panels) monitoring ingestion across tenants • 4 alert rules (drop + stop detection) • Secrets flow: AWS → ExternalSecrets → K8s → Grafana • Access secured via ALB + TLS + OAuth Everything wired end-to-end ✅ Next → rollout & validate at higher level in mgmt #DevOps #AWS #Kubernetes #Terraform #GitOps #ArgoCD #Grafana #Observability #CloudArchitecture #SRE #Poland #USA #UAE #UK #Europe #Italy #Gulf #KSA #TechJobs #DevOpsJobs #CloudJobs #Hiring #OpenToWork #Viral
3 Comments
Like Comment
To view or add a comment, sign in
Leonardo Santos-Macias. PhD, MSc
6d Edited
Report this post
🚀 𝐀𝐳𝐮𝐫𝐞 𝐃𝐞𝐯𝐎𝐩𝐬 𝐑𝐨𝐚𝐝𝐦𝐚𝐩 – 𝐅𝐫𝐨𝐦 𝐃𝐞𝐯𝐞𝐥𝐨𝐩𝐞𝐫 𝐭𝐨 𝐃𝐞𝐥𝐢𝐯𝐞𝐫𝐲 𝐄𝐧𝐠𝐢𝐧𝐞𝐞𝐫 While working across multi-cloud projects and CI/CD pipelines, one thing became clear: DevOps is not a toolset — it’s a discipline. Here’s a roadmap that actually works in real-world environments 👇 🎯 𝐅𝐨𝐮𝐧𝐝𝐚𝐭𝐢𝐨𝐧𝐬 • Strong programming basics (Python, C#, or JavaScript) • Git workflows (branching, rebasing, PR reviews) • Linux fundamentals (processes, networking, permissions) • Networking basics (DNS, HTTP/S, load balancing) 💡 This layer defines how fast you’ll progress later. ☁️ 𝐂𝐨𝐫𝐞 𝐀𝐳𝐮𝐫𝐞 𝐊𝐧𝐨𝐰𝐥𝐞𝐝𝐠𝐞 • Azure Resource Manager (ARM) & Bicep • Core services: App Services, Storage, Key Vault, VMs • Identity: Azure AD, RBAC, Managed Identities • Monitoring: Azure Monitor, Log Analytics 💡 If you don’t understand how Azure resources behave, pipelines won’t make sense. 🔁 𝐂𝐈/𝐂𝐃 𝐰𝐢𝐭𝐡 𝐀𝐳𝐮𝐫𝐞 𝐃𝐞𝐯𝐎𝐩𝐬 • Azure Repos (Git-based version control) • Azure Pipelines (YAML-first approach) • Build pipelines vs Release pipelines • Artifact management (Azure Artifacts) 💡 Move fast to YAML pipelines — UI pipelines don’t scale well. 📦 𝐂𝐨𝐧𝐭𝐚𝐢𝐧𝐞𝐫𝐬 & 𝐎𝐫𝐜𝐡𝐞𝐬𝐭𝐫𝐚𝐭𝐢𝐨𝐧 • Docker (image creation, optimization) • Kubernetes fundamentals • Azure Kubernetes Service (AKS) • Helm charts & deployments 💡 Most modern workloads end up containerized — this is not optional anymore. 🏗️ 𝐈𝐧𝐟𝐫𝐚𝐬𝐭𝐫𝐮𝐜𝐭𝐮𝐫𝐞 𝐚𝐬 𝐂𝐨𝐝𝐞 (𝐈𝐚𝐂) • Bicep (preferred for Azure-native) • Terraform (multi-cloud scenarios) • Environment consistency (dev/test/prod) • State management & drift detection 💡 Real DevOps starts when infra is version-controlled. 🔐 𝐒𝐞𝐜𝐮𝐫𝐢𝐭𝐲 & 𝐂𝐨𝐦𝐩𝐥𝐢𝐚𝐧𝐜𝐞 (𝐃𝐞𝐯𝐒𝐞𝐜𝐎𝐩𝐬) • Secret management (Key Vault integration) • Pipeline security (service connections, approvals) • Dependency scanning & SAST/DAST • RBAC and least privilege 💡 Security should be embedded, not added later. 📊 𝐎𝐛𝐬𝐞𝐫𝐯𝐚𝐛𝐢𝐥𝐢𝐭𝐲 & 𝐅𝐞𝐞𝐝𝐛𝐚𝐜𝐤 • Application Insights • Distributed tracing • Metrics, logs, alerts • Incident response loops 💡 If you can’t measure it, you can’t improve it. ⚙️ 𝐀𝐝𝐯𝐚𝐧𝐜𝐞𝐝 𝐏𝐫𝐚𝐜𝐭𝐢𝐜𝐞𝐬 • Blue/Green & Canary deployments • GitOps (Flux / ArgoCD concepts) • Multi-stage pipelines • Platform engineering mindset 📌 𝐂𝐞𝐫𝐭𝐢𝐟𝐢𝐜𝐚𝐭𝐢𝐨𝐧 𝐏𝐚𝐭𝐡 • AZ-900 → Fundamentals • AZ-104 → Azure Administrator • AZ-204 → Azure Developer • AZ-400 → DevOps Engineer Expert 🔥 𝐑𝐞𝐚𝐥-𝐖𝐨𝐫𝐥𝐝 𝐓𝐢𝐩 👉 Treat pipelines as products 👉 Version everything (code, infra, configs) 👉 Automate aggressively, but validate continuously 𝘼𝙙: 𝙒𝙖𝙣𝙩 𝙩𝙤 𝙬𝙖𝙡𝙠 𝙞𝙣𝙩𝙤 𝙮𝙤𝙪𝙧 𝙣𝙚𝙭𝙩 𝙘𝙡𝙤𝙪𝙙 𝙚𝙭𝙖𝙢 𝙬𝙞𝙩𝙝 𝙘𝙤𝙣𝙛𝙞𝙙𝙚𝙣𝙘𝙚? https://lnkd.in/gwAKqK9u #Azure #AzureDevOps #DevOps #CloudComputing #CI_CD #Kubernetes #Docker #InfrastructureAsCode #Terraform #Bicep #AKS #CloudEngineer #SoftwareEngineering #PlatformEngineering

1 Comment
Like Comment
To view or add a comment, sign in

1,099 followers

4 Posts

View Profile Connect

Kubernetes Cheat Sheet for DevOps Interviews

More Relevant Posts

Explore related topics

Explore content categories