Fixing Disk Monitor Script for Production Readiness

Working code and production ready code are not the same thing. I wrote a Bash disk monitoring script. It ran. It logged. It alerted. Technically correct. Then I did a senior engineer code review on it. Five gaps found in under 5 minutes: Stale timestamps:captured once at startup, not per log entry No log rotation: a disk monitor that fills your disk is not a disk monitor No dependency checks: fails cryptically on minimal servers Inconsistent log formatting: breaks any tool that parses those logs Silent on healthy systems: looks broken when it is working fine None of these are bugs. All of them matter in production. Now I fix it. #DevOps #Linux #BashScripting

To view or add a comment, sign in

More Relevant Posts

kanwar Azeem
2w
Report this post
From Kernel Panic to Problem Solved: Navigating the Linux Initramfs Shell 🐧💻 Ever turned on your computer only to be greeted by a daunting black screen and a BusyBox prompt? I recently ran into a "File System Inconsistency" error that dropped my Linux system into the (initramfs) shell. While these boot errors can be intimidating, they are also a great reminder of how critical fundamental CLI (Command Line Interface) skills are. The issue was a corrupted partition on /dev/sdb2. By manually running a file system check (fsck), I was able to identify the corrupted inodes, repair the blocks, and get the system back to a healthy state. Key Takeaways: ✅ Don't Panic: Error messages are maps, not dead ends. Reading the logs carefully pointed me exactly to the corrupted partition. ✅ Syntax Matters: Even a missing space in a command can be the difference between "not found" and a successful repair. ✅ The Power of the CLI: Understanding what happens "under the hood" of an OS makes you a more resilient developer/user. There is nothing quite as satisfying as seeing "FILE SYSTEM WAS MODIFIED" and watching your OS boot up normally again! Have you had any "fun" troubleshooting adventures lately? Let's swap stories in the comments! #Linux #OpenSource #Troubleshooting #TechSkills #ProblemSolving #DevOps #SystemAdministration #CLI #BusyBox
Like Comment
To view or add a comment, sign in
Amol Vidhate
1w
Report this post
🚨 Application working but users still facing issues? Check logs like a pro Sometimes everything looks fine from your side… Service is running ✅ Server is up ✅ But users still report issues 😓 This is where logs become your best friend. --- 🔍 1. Start with application logs 👉 Check inside `/var/log` or app-specific folder 👉 Example: `tail -f /var/log/nginx/error.log` --- 📄 2. Look for keywords 👉 error 👉 failed 👉 timeout 👉 denied --- 🧠 3. Check timestamps 👉 Match logs with issue time 👉 Helps find exact event --- ⚡ 4. Check service logs `journalctl -u nginx` 👉 System-level logs for services --- 🌐 5. Correlate with user issue 👉 What user did? 👉 What happened in logs at same time? --- 🔁 6. Reproduce issue 👉 Try same steps 👉 Watch logs in real-time --- 🧠 Real mindset: Don’t guess ❌ Logs never lie ✅ --- #Linux #LinuxAdmin #DevOps #Troubleshooting #CloudComputing #SystemAdministration #LearningInPublic #ITInfrastructure
Like Comment
To view or add a comment, sign in
Spencer Kennedy Tan
4d
Report this post
𝗜 𝗯𝘂𝗶𝗹𝘁 𝘁𝗵𝗲 𝗺𝗼𝗻𝗶𝘁𝗼𝗿𝗶𝗻𝗴. 𝗧𝗵𝗲𝗻 𝗜 𝗯𝗿𝗼𝗸𝗲 𝘀𝗼𝗺𝗲𝘁𝗵𝗶𝗻𝗴 𝗼𝗻 𝗽𝘂𝗿𝗽𝗼𝘀𝗲 𝘁𝗼 𝘀𝗲𝗲 𝗶𝗳 𝗶𝘁 𝘄𝗼𝗿𝗸𝗲𝗱. (𝗣𝗮𝗿𝘁 𝟮 𝗼𝗳 𝟮 | 𝗣𝗮𝗿𝘁 𝟭 𝗰𝗼𝘃𝗲𝗿𝘀 𝘁𝗵𝗲 𝗯𝘂𝗶𝗹𝗱 𝗶𝗳 𝘆𝗼𝘂 𝗺𝗶𝘀𝘀𝗲𝗱 𝗶𝘁.) Uptime Kuma is watching Proxmox, both VMs, and every service running on THE MONOLITH. I shut down the Linux VM deliberately. Within seconds, red on the dashboard, Telegram notification fired. This is the part tutorials skip. Setting up monitoring is one thing. Knowing it actually works when something goes down is another. You don't find out by reading about it. In production this is the difference between finding out a server is down from your monitoring system or finding out from a client. Phase 3 is K3s. I've never touched Kubernetes. THE MONOLITH is about to become a cluster. Let's find out what breaks first. #docker #devops #homelab #monitoring #uptimekuma #learninginpublic
Like Comment
To view or add a comment, sign in
Monireh Savaedi
1w
Report this post
🔧 When a Simple Install Turns Into a Kernel-Level Lesson Today I just wanted to install VirtualBox so I could spin up a Linux machine for practicing and working on some DevOps stuff. I thought it would take a few minutes… but it turned into a small troubleshooting journey. At first, VirtualBox wouldn’t start and kept throwing the rc=-1908 error. After digging a bit, I realized I was running kernel 6.17, and the required module (vboxdrv) wasn’t being built properly. Even reinstalling didn’t help because DKMS still failed to build the module on that kernel. The workaround was booting into a more stable kernel (6.14) so the modules could finally build. I thought that would solve everything, but there was one more blocker: Secure Boot. It was preventing the module from loading and kept returning a “Key was rejected” error. After disabling Secure Boot and loading the modules, VirtualBox finally started working. What I expected to be a simple installation ended up being a good reminder: sometimes issues come from different layers of the system, and the only way forward is to debug them step by step. #DevOps #Linux #Troubleshooting #VirtualBox #Kernel #DKMS #LearningJourney
Like Comment
To view or add a comment, sign in
Krishna Tej Chalamalasetty
3w Edited
Report this post
Containers from the Ground Up ➔ Part 2: Linux Namespaces Docker, Inc or Podman Desktop didn't secure your containers. The Linux kernel did. Most engineers I talk to know containers provide isolation. Far fewer can tell you what enforces it when something actually goes wrong. I spent time going one layer deeper, past the runtime, past the daemon, down to the kernel feature that makes all of it work: namespaces. A few things that surprised me along the way: chroot was the best isolation Linux had before namespaces existed. A process inside it could still see every PID on the host, bind any network port, and a root process could escape it entirely with two syscalls. We called it a jail. It wasn't. Every process on Linux is always inside a namespace even on a bare machine with no containers running. The kernel creates the initial ones at boot. Every process inherits its parent's. A container is just a process that got new ones via clone(). When I ran a simple loop comparing /proc/1/ns/ against my container process, the picture became concrete. Separate inodes for pid, net, mnt, uts, ipc. Same inode for user which is expected for a rootless setup. Each different inode is a hard kernel boundary, not a runtime abstraction. The failure modes map cleanly once you know the model: — Container leaking network traffic? net namespace. — Seeing host PIDs from inside a container? Missing CLONE_NEWPID. — Rootless container can't bind port 80? Not a firewall rule. UID mapping in the user namespace. At its core, podman run is a sequence of clone() calls with the right flags, followed by execve(). The runtime configures. The kernel enforces. Debugging changed for me after internalising this. I stopped reading container logs first and started reading /proc and lsns. Wrote this up properly with verified commands if you want to run through it yourself. Link in the comments. #Linux #Containers #SoftwareEngineering #DistributedSystems #Kubernetes #DevOps #SystemsProgramming #BackendEngineering

1 Comment
Like Comment
To view or add a comment, sign in
Michael Mendy
1w Edited
Report this post
Linux From Scratch has been called impractical for years. Fine. Of course it is. It takes at least forty hours, and that is assuming things go reasonably well. When you finish, there is no package manager waiting to rescue you. No clean update path. No layer of polish smoothing over the hard parts. Every binary on that machine exists because you compiled it. Every configuration file exists because you wrote it. That much is obvious. The real question is whether that matters. It does. More than most people are willing to say out loud. Not because LFS belongs on a production server. It does not. Not because it is the smartest way to run a modern environment. It is not. That misses the point completely. The point is what it does to your understanding. Because there is a kind of knowledge that only comes from doing something the hard way, from first principles, with your own hands. You do not get that from browsing a wiki. You do not get it from skimming documentation and nodding along. You get it by getting stuck. By getting it wrong. By sitting in a chroot at two in the morning, staring at a kernel that refuses to compile, and staying there until the system finally makes sense. That experience does something documentation alone cannot do. It burns the lesson in. It forces the abstractions to fall away. It turns Linux from a product you use into a system you actually understand. And once you have that, you keep it. Read more here: https://lnkd.in/gtiUeRhb #Linux #LinuxFromScratch #OpenSource #SystemsEngineering #DevOps #Infrastructure #SoftwareEngineering #OperatingSystems #LearnByDoing #LFS
88 Comments
Like Comment
To view or add a comment, sign in
Ron Northcutt
5d
Report this post
I think this is the future of recruiting and Jr employee training in the age of AI. It takes a long time and hard work to learn and gain experience. But now, AI means that anyone can take the shortcut. The problem is that the real value (experience) is lost. This is something we will need to replace. Programs like this will start to be an important part of any new employee training program - a way to quickly, but clearly, get some deep and practical experience that advances the person years ahead. The output is not the goal - that will be discarded. The goal is the experience and knowledge that the human gains. This is a very different way to think, and a different way to value human growth. We need to invest in training people and growing them in their ability to do the things that AI can't do. That takes time, but the results are necessary.
Michael Mendy

Software Engineer | DevOps | CI/CD | NIST | michaelamendy.com | github.com/montana
1w Edited

Linux From Scratch has been called impractical for years. Fine. Of course it is. It takes at least forty hours, and that is assuming things go reasonably well. When you finish, there is no package manager waiting to rescue you. No clean update path. No layer of polish smoothing over the hard parts. Every binary on that machine exists because you compiled it. Every configuration file exists because you wrote it. That much is obvious. The real question is whether that matters. It does. More than most people are willing to say out loud. Not because LFS belongs on a production server. It does not. Not because it is the smartest way to run a modern environment. It is not. That misses the point completely. The point is what it does to your understanding. Because there is a kind of knowledge that only comes from doing something the hard way, from first principles, with your own hands. You do not get that from browsing a wiki. You do not get it from skimming documentation and nodding along. You get it by getting stuck. By getting it wrong. By sitting in a chroot at two in the morning, staring at a kernel that refuses to compile, and staying there until the system finally makes sense. That experience does something documentation alone cannot do. It burns the lesson in. It forces the abstractions to fall away. It turns Linux from a product you use into a system you actually understand. And once you have that, you keep it. Read more here: https://lnkd.in/gtiUeRhb #Linux #LinuxFromScratch #OpenSource #SystemsEngineering #DevOps #Infrastructure #SoftwareEngineering #OperatingSystems #LearnByDoing #LFS
Like Comment
To view or add a comment, sign in
Anthony A
4d
Report this post
🔧 From Jenkins Failures to Production Hero — A DevOps Linux War Story Last week, a routine deployment turned into a firefight. Jenkins jobs started failing, Nginx returned 502s, and disk space was critically low. The culprit? A mix of full /var, zombie processes, and a memory leak. Here’s what saved the day — real Linux commands every DevOps engineer should have in their back pocket: 🚨 Disk Space Alert df -h → du -sh /* → docker system prune -a --volumes 🔥 High CPU / I/O Wait top → ps -eo pid,comm,%cpu → iostat -xz 1 🧟 Zombie Processes ps aux | awk '$8~/Z/' → kill -9 <PPID> 🔐 SSH / Permission Issues ssh -vvv → chmod 700 ~/.ssh → sudo grep pubkey /var/log/auth.log 📦 Systemd Failing After Reboot systemctl enable docker → journalctl -u docker -n 200 👉 One option: Save this post — it’s your Linux triage cheat sheet. 💬 CTA: What’s your most-used Linux debug command in production? Drop it below 👇 #DevOps #Linux #SRE #ProductionSupport #SysAdmin
Like Comment
To view or add a comment, sign in
Krishna Tej Chalamalasetty
1mo
Report this post
Most container tutorials teach you the workflow. None of them show you what the kernel actually does. I ran some commands that changed how I think about containers entirely. When I ran nsenter into my container's PID namespace, the process I'd been calling "PID 1 inside the container" showed up as PID 13684 on the host. Same process. Two identities. One per namespace. When I ran uname -r on the host and inside the container, I got identical output. There is no guest kernel. The container is sharing mine. When I read /sys/fs/cgroup for the container process, the memory limit mapped exactly to the --memory flag I passed at startup. The kernel writes that value. The kernel enforces it. Podman is just the configurator. One more thing most engineers don't think about: by the time your container process appears in the process table, the OCI runtime (crun on modern Linux) has already exited. Its job was setup — clone() the namespaces, write the cgroup limits, unpack the filesystem layers, execve() your process. Then it's gone. I wrote the full breakdown with every command and real terminal output on my blog. Link in the first comment. 👇 #linux #containers #backend #distributedsystems #softwareengineering

4 Comments
Like Comment
To view or add a comment, sign in

657 followers

15 Posts

View Profile Follow

Fixing Disk Monitor Script for Production Readiness

More Relevant Posts

Explore content categories