Python System Monitoring Script for DevOps

🚀 Python for DevOps: Real-Time System Monitoring Script (CPU + Memory + Disk) One of the most practical skills in DevOps is automating system monitoring. Instead of manually checking servers, I built a simple Python script that: ✅ Monitors CPU usage ✅ Tracks Memory consumption ✅ Checks Disk utilization ✅ Triggers alerts when thresholds are exceeded 💻 Full Script import psutil import shutil # Thresholds CPU_THRESHOLD = 80 MEM_THRESHOLD = 80 DISK_THRESHOLD = 80 def check_cpu(): cpu = psutil.cpu_percent(interval=1) if cpu > CPU_THRESHOLD: print(f"ALERT: High CPU usage: {cpu}%") else: print(f"OK: CPU usage: {cpu}%") def check_memory(): mem = psutil.virtual_memory() usage = mem.percent if usage > MEM_THRESHOLD: print(f"ALERT: High Memory usage: {usage}%") else: print(f"OK: Memory usage: {usage}%") def check_disk(): disk = shutil.disk_usage("/") usage = (disk.used / disk.total) * 100 if usage > DISK_THRESHOLD: print(f"ALERT: High Disk usage: {usage:.2f}%") else: print(f"OK: Disk usage: {usage:.2f}%") def main(): print("===== System Monitoring =====") check_cpu() check_memory() check_disk() if __name__ == "__main__": main() Output: ubuntu@satheesha:~/python$ python3 full-monitor-script.py ===== System Monitoring ===== OK: CPU usage: 1.0% OK: Memory usage: 30.5% OK: Disk usage: 2.74% ⚙️ How I Used It Installed dependency using: sudo apt install python3-psutil Ran the script to get real-time system health Can be scheduled using cron for continuous monitoring 🔥 Why This Matters in DevOps 👉 Helps detect issues before outages 👉 Reduces manual effort 👉 Can be extended to send alerts (Email / Slack / SNS) 👉 Foundation for tools like monitoring agents 🎯 Key Learning "Don’t just run commands like top or df -h — automate them using Python and build intelligent monitoring." 🚀 Next Steps I’m planning to: Integrate this with Jenkins pipeline Send alerts to Slack Push metrics to monitoring tools 💬 How do you monitor your servers in real-time? #DevOps #Python #Automation #Monitoring #SRE #Cloud #Linux #Jenkins #Learning #100DaysOfCode

To view or add a comment, sign in

More Relevant Posts

Satheesha B N
1w
Report this post
🚀 Python for DevOps – API Monitoring with requests Practiced using Python’s requests library to check API health, a common real-world DevOps task. 📂 Use Case: In production, services depend on APIs. We need to continuously verify if APIs are reachable and healthy. 💻 Python Script: import requests url = "https://api.github.com" try: res = requests.get(url, timeout=5) if res.status_code == 200: print("✅ GitHub API is UP") else: print("⚠️ GitHub API issue:", res.status_code) except requests.exceptions.RequestException as e: print("🚨 API call failed:", e) Output: Status_code: 200 Response: {'current_user_url': 'https://lnkd.in/guvkNT7k', 'current_user_authorizations_html_url': 'https://lnkd.in/gx-65ERd', 'authorizations_url': 'https://lnkd.in/gzcehbTu', 'code_search_url': 'https://lnkd.in/gQU8cghE', 'commit_search_url': 'https://lnkd.in/g62A-__n', 'emails_url': 'https://lnkd.in/gXaZyEkK', 'emojis_url': 'https://lnkd.in/gp3Scn2Y', 'events_url': 'https://lnkd.in/grbt4NNg', 'feeds_url': 'https://lnkd.in/gCBk-eSN', 'followers_url': 'https://lnkd.in/gQvSEXqB', 'following_url': 'https://lnkd.in/grh4YDpJ', 🔍 What this does: Sends HTTP request to API Uses timeout to avoid hanging Checks response status Handles failures gracefully 🔥 Why this matters in DevOps: Monitor service availability Validate endpoints in CI/CD pipelines Detect outages early Automate health checks 💡 Key Learning: APIs are everywhere in DevOps, and Python makes it easy to integrate, monitor, and automate systems. 📈 Next Steps: Send alerts (Slack/Email) if API fails Combine with log monitoring scripts Build a full monitoring + alerting system #Python #DevOps #API #Automation #Monitoring #Scripting #Cloud #Learning #100DaysOfCode
Like Comment
To view or add a comment, sign in
Satheesha B N
1w
Report this post
🚀 Python for DevOps – Log Level Automation Project Today I built a practical DevOps script using Python to analyze logs and separate them based on log levels. 📂 Problem: Manually checking logs is time-consuming. Needed a way to automatically filter and organize logs. 💻 Solution (Python Script): with open("app.log") as f, \ open("error.log", "w") as err, \ open("warning.log", "w") as warn, \ open("info.log", "w") as info: for line in f: if "ERROR" in line: err.write(line) elif "WARNING" in line: warn.write(line) elif "INFO" in line: info.write(line) ####################################### Output: ubuntu@satheesha:~/python$ python3 multiple-log_level.py ubuntu@satheesha:~/python$ ls -ltr error.log warning.log info.log -rw-r--r-- 1 ubuntu ubuntu 18 Apr 21 07:45 warning.log -rw-r--r-- 1 ubuntu ubuntu 44 Apr 21 07:45 info.log -rw-r--r-- 1 ubuntu ubuntu 17 Apr 21 07:45 error.log ubuntu@satheesha:~/python$ cat app.log INFO: Service startes WARNING: High CPU INFO: Service startes ERROR: Disk full ubuntu@satheesha:~/python$ cat error.log ERROR: Disk full ubuntu@satheesha:~/python$ cat warning.log WARNING: High CPU ubuntu@satheesha:~/python$ cat info.log INFO: Service startes INFO: Service startes ######################################### does: Reads app.log Filters logs into: error.log warning.log info.log 📊 Outcome: Faster troubleshooting Organized logs for better monitoring Reduced manual effort 🔥 Real DevOps Use Cases: Production log monitoring CI/CD pipeline validation Incident detection and alerting 💡 Key Learning: Python is a powerful tool for automation in DevOps, especially for handling logs and system data. 📈 Next Step: Enhancing this script to: Count log levels Trigger alerts (email/Slack) Monitor logs in real-time (tail -f style) #Python #DevOps #Automation #Scripting #Cloud #Learning #100DaysOfCode
Like Comment
To view or add a comment, sign in
Satheesha B N
1w
Report this post
🚀 Python for DevOps – Log Analysis with Metrics Today I enhanced my log automation script to not only separate logs by level but also count them for quick insights. 📂 Problem: Manually analyzing logs is slow and inefficient. 💻 Solution (Python Script): error = warning = info_count = 0 with open("app.log") as f, \ open("error.log", "w") as err, \ open("warning.log", "w") as warn, \ open("info.log", "w") as info: for line in f: if "ERROR" in line: err.write(line) error += 1 elif "WARNING" in line: warn.write(line) warning += 1 elif "INFO" in line: info.write(line) info_count += 1 print("ERROR:", error) print("WARNING:", warning) print("INFO:", info_count) 🔍 What this does: Reads app.log Splits logs into separate files Counts each log level 📊 Sample Output: ERROR: 1 WARNING: 1 INFO: 2 Output: ubuntu@satheesha:~/python$ python3 multiple-log_level.py ERROR: 1 WARNING: 1 INFO_COUNT: 2 ubuntu@satheesha:~/python$ ls -ltr error.log warning.log info.log -rw-r--r-- 1 ubuntu ubuntu 18 Apr 21 08:20 warning.log -rw-r--r-- 1 ubuntu ubuntu 44 Apr 21 08:20 info.log -rw-r--r-- 1 ubuntu ubuntu 17 Apr 21 08:20 error.log ubuntu@satheesha:~/python$ cat error.log ERROR: Disk full ubuntu@satheesha:~/python$ cat warning.log WARNING: High CPU ubuntu@satheesha:~/python$ cat info.log INFO: Service startes INFO: Service startes 🔥 Why this matters: Quick visibility into system health Helps prioritize issues (ERROR > WARNING > INFO) Reduces manual troubleshooting time 💡 Key Learning: Python can be used not just for automation, but also for real-time insights and monitoring in DevOps. 📈 Next Step: Add alerting when ERROR count exceeds threshold Integrate with monitoring tools Build real-time log monitoring (tail -f in Python) #Python #DevOps #Automation #Scripting #Monitoring #Cloud #Learning
Like Comment
To view or add a comment, sign in
Irshad Khan
5d
Report this post
🚨 I used to overcomplicate Python in DevOps… until real CI/CD pipelines taught me something simple. When I started working with automation, I thought I needed heavy frameworks and advanced Python structures to build “real DevOps scripts”. But in production environments, I realized something very different: 👉 DevOps automation is not about complexity 👉 It’s about using the right simple tools reliably In most CI/CD and cloud automation work, I ended up using only a small set of Python standard library modules: os → environment variables, system interaction subprocess → running real commands (docker, kubectl, terraform) json → APIs, Kubernetes configs, pipeline responses logging → production-grade observability pathlib → clean file and artifact handling datetime → deployment tracking & audit logs sys → CLI control and pipeline exit handling shutil → backups and artifact management Real example from DevOps work: Instead of building complex tools, I often use Python scripts to: automate deployment steps execute validation commands capture logs from CI/CD pipelines interact with cloud APIs The biggest lesson I learned: 👉 In DevOps, simplicity always wins over complexity. Because in production, reliability matters more than clever code. What Python modules do you find yourself using the most in DevOps automation? #DevOps #Python #CloudComputing #CI/CD #Automation #SRE
Like Comment
To view or add a comment, sign in
Satheesha B N
1w
Report this post
🚀 Python for DevOps – Log Monitoring with Timestamp & Alerts (Mini Project) Built a hands-on Python script to analyze logs, generate alerts, and track system health — a small step toward real-world DevOps automation. 📂 Problem: Manually scanning logs is inefficient and error-prone. Needed a way to automatically filter and track critical issues. 💻 Solution (Python Script): from datetime import datetime ERROR_COUNT = 0 WARNING_COUNT = 0 INFO_COUNT = 0 with open("app.log") as f, open("alerts.log", "a") as alert_file: for line in f: timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S") if "ERROR" in line: ERROR_COUNT += 1 alert_file.write(f"{timestamp} - {line.strip()}\n") elif "WARNING" in line: WARNING_COUNT += 1 alert_file.write(f"{timestamp} - {line.strip()}\n") elif "INFO" in line: INFO_COUNT += 1 print("============ LOG SUMMARY ============") print("ERROR:", ERROR_COUNT) print("WARNING:", WARNING_COUNT) print("INFO:", INFO_COUNT) Output: ubuntu@satheesha:~/python$ python3 log-mon_alert-time.py ============LOG SUMMARY================ ERROR: 1 WARNING: 1 INFO: 2 ubuntu@satheesha:~/python$ cat alerts.log 2026-04-21 11:37:1776771454 - INFO - INFO: Service startes 2026-04-21 11:37:1776771454 - WARNING - WARNING: High CPU 2026-04-21 11:37:1776771454 - INFO - INFO: Service startes 2026-04-21 11:37:1776771454 - ERROR - ERROR: Disk full 2026-04-21 11:45:59 - INFO - INFO: Service startes 2026-04-21 11:45:59 - WARNING - WARNING: High CPU 2026-04-21 11:45:59 - INFO - INFO: Service startes 2026-04-21 11:45:59 - ERROR - ERROR: Disk full 🔍 What this script does: Reads application logs (app.log) Filters critical log levels (ERROR / WARNING / INFO) Appends important alerts into alerts.log Adds timestamps for better traceability Generates summary metrics for quick insights 📊 Why this matters: Faster troubleshooting in production Clear visibility into system health Reduces manual effort in log analysis 🔥 Key Learning: Python is a powerful tool in DevOps—not just for scripting, but for automation, monitoring, and observability. 📈 Next Steps: Add alerting (Email / Slack integration) Convert logs to structured format (JSON for ELK stack) Build real-time log monitoring (tail -f style) #Python #DevOps #Automation #Logging #Monitoring #Cloud #Scripting #Learning #100DaysOfCode
Like Comment
To view or add a comment, sign in
Satheesha B N
1w
Report this post
🚀 Python Basics for DevOps Engineers (Practical Examples) Python is a powerful tool for automation in DevOps. Here’s a quick guide to essential data types with real-world use cases 👇 🔹 1. String (str) Used for text (server names, logs, messages) server = "web-server-1" print(server) 💡 DevOps Example: log = "ERROR: Disk full" if "ERROR" in log: print("Issue found") 🔹 2. Integer (int) Used for numbers (CPU, memory, ports) cpu = 75 print(cpu) 💡 DevOps Example: cpu = 85 if cpu > 80: print("Alert: High CPU") 🔹 3. Boolean (True / False) Used for status (running/stopped, success/failure) is_running = True if is_running: print("Service is running") 💡 DevOps Example: deployment_success = False if not deployment_success: print("Rollback required") 🔹 4. List (list) Used to store multiple values (servers, services) servers = ["web1", "web2", "web3"] print(servers[0]) 💡 DevOps Example: services = ["nginx", "docker", "jenkins"] for service in services: print(service) 🔹 5. Combine All (Real Example) servers = ["web1", "web2"] cpu_usage = 85 status = True if cpu_usage > 80: print("Alert: scale up needed") if status: for s in servers: print(f"{s} CPU: {cpu_usage}") 🔹 6. Quick Practice services = ["web1", "web2"] status = True cpu_usage = 85 # fixed variable name if status: for s in services: print(f"Server {s} CPU {cpu_usage}") if cpu_usage > 80: print(f"Alert: CPU {cpu_usage}") Out put: >>> services = ["web1", "web2"] >>> status = True >>> cup_usage = 85 >>> >>> if status: ... for s in services: ... print(f"server {s} CPU {cpu_usage}") ... server web1 CPU 85 server web2 CPU 85 >>> if cpu_usage > 80: ... print(f"Alert: CPU {cpu_usage}") ... Alert: CPU 85 💡 Key Takeaway: Mastering these basics helps automate monitoring, alerts, and system management in real DevOps environments. #DevOps #Python #Automation #Scripting #Learning #AWS #Kubernetes
Like Comment
To view or add a comment, sign in
Rohit Kumar Chintamani
1w
Report this post
I was building a self-healing observability platform and hit a subtle bug: Alertmanager was silently ignoring environment variables in YAML because of how it resolves them at load time - not at runtime. Here's what I learned. My setup: Spring Boot microservices instrumented with OpenTelemetry, Prometheus scraping metrics, Grafana for dashboards, and Alertmanager routing alerts to a Python self-healing script that automatically remediated common failure modes - restarting unhealthy services, recovering dropped database connections. Everything worked with hardcoded config. The moment I moved sensitive values into environment variables, Alertmanager went silent. No errors. No warnings. Just nothing firing. The bug: yaml receivers: - name: 'self-healer' webhook_configs: - url: '${SELF_HEALER_URL}' Alertmanager does not perform shell-style variable substitution. It treats ${SELF_HEALER_URL} as a literal string - routing alerts to nowhere, silently. Intentional design, not a bug. But it will absolutely catch you off guard. The fix: Use an entrypoint script to substitute before Alertmanager reads the file: bash envsubst < /etc/alertmanager/alertmanager.template.yml \ > /etc/alertmanager/alertmanager.yml Keep a .template.yml with your placeholders. Entrypoint runs envsubst at container startup, writes the resolved file, Alertmanager reads it clean. Alerts fired within 30 seconds. The broader lesson: When something in your observability stack fails silently, the first question isn't "what's wrong with my values" - it's "is this tool even reading what I think it's reading." Test with a hardcoded value first. Always. What's the most frustrating silent failure you've hit in an observability or infrastructure tool? Drop it below. https://lnkd.in/g7pjrMZ2 #SRE #DevOps #Observability #Prometheus #Alertmanager #OpenTelemetry #Kubernetes #PlatformEngineering #SoftwareEngineering

GitHub - RohitKumar2306/self-healing-observability-platform github.com
Like Comment
To view or add a comment, sign in
Hitesh TP
3w
Report this post
🚀 Docker Day 2 | Part 1: Mastering Dockerfile Fundamentals (The RIGHT Way to Choose Base Images) In modern DevOps workflows, building efficient and production-ready Docker images starts with one critical decision: choosing the correct base image. A Dockerfile is not just a configuration file — it’s a deterministic blueprint that defines how your application is packaged, built, and executed across environments. 🔹 What is a Dockerfile? A Dockerfile is a declarative script containing step-by-step instructions to build a container image. 💡 Think of it as: Dockerfile = Reproducible Build Recipe for Your Application 🔹 Dockerfile Execution Workflow Write Dockerfile Build image → docker build -t my-app . Docker processes instructions sequentially Image layers are created Container runs from the final image 🔹 The MOST Important Decision: Choosing the Base Image Selecting the wrong base image can lead to: Larger image sizes Security vulnerabilities Runtime failures Inefficient CI/CD pipelines 👉 The correct approach is technology-driven identification, not guesswork. 🔍 Step 1: Identify Your Project Type Always start by analyzing your project structure. 🟢 Case 1: Frontend (React / Node.js) How to identify: package.json src/, public/ Dependency: "react-scripts" Conclusion: ➡ Requires Node.js runtime Base Image: FROM node:18 ☕ Case 2: Java (Spring Boot) How to identify: pom.xml or build.gradle src/main/java/ Conclusion: ➡ Requires JVM (Java Runtime) Base Image: FROM openjdk:17 🐍 Case 3: Python Applications How to identify: requirements.txt app.py / main.py Conclusion: ➡ Requires Python interpreter Base Image: FROM python:3.10 🐍 Case 3: Python Applications How to identify: requirements.txt app.py / main.py Conclusion: ➡ Requires Python interpreter Base Image: FROM python:3.10 🌐 Case 5: Static Websites (HTML/CSS) How to identify: index.html No backend logic Conclusion: ➡ No runtime needed, only a web server Base Image: FROM nginx:latest ⚡ Pro-Level Insight (MNC Standard) ✔ Always match runtime = project language ✔ Prefer official and minimal images (e.g., alpine variants when possible) ✔ Separate build stage and runtime stage (multi-stage builds) ✔ Avoid unnecessary dependencies in production images 🧠 Key Takeaway 👉 A Dockerfile is not about writing commands — it’s about understanding your application architecture deeply. Correct base image selection = ✔ Faster builds ✔ Smaller images ✔ Secure deployments ✔ Production stability #Docker #DevOps #CloudNative #Kubernetes #CI_CD #SoftwareEngineering #Containerization #MLOps #Backend #Frontend #Java #Python #Golang
Like Comment
To view or add a comment, sign in
Dr. Gopal Singhal
1w
Report this post
In the era of GenAI, which language should I learn - Python or Go? An interesting question from one of my DevOps engineers. At first glance, it sounds like a straightforward choice: Go powers much of the modern cloud-native ecosystem (Kubernetes, Docker, Terraform…) Python has been the backbone of automation, scripting, and now AI/ML But the real answer is a bit uncomfortable: 👉 The language you choose matters less than how you think about building software. The Shift We’re Living Through With LLMs like Claude Sonnet or Opus, generating code is no longer the bottleneck. You can: - Scaffold a REST API in seconds - Generate Terraform modules - Write Kubernetes operators - Automate workflows So if code generation is becoming commoditized… 👉 What actually differentiates engineers going forward? What Still Matters (More Than Ever) 1. Understanding Trade-offs Knowing why Go is used for infrastructure tools: - Concurrency model (goroutines, channels) - Static binaries (ease of distribution) - Performance and low memory footprint Knowing why Python dominates automation: - Rich ecosystem - Faster prototyping - Simplicity and readability AI can generate both but it won’t deeply understand your system constraints unless you do. 2. System Design Thinking Can you answer: - Should this be a long-running service or a batch job? - When do you use event-driven vs polling? - Where does the state live? - How does this scale under failure? These decisions are language-agnostic and AI won’t get them right without strong guidance. 3. Code Quality & Maintainability Generated code often works… until it doesn’t. The real skill is: - Structuring codebases - Applying design patterns appropriately - Writing testable, observable systems - Managing dependencies and versioning In DevOps especially, “quick scripts” often become “critical systems” overnight. 4. Understanding the Runtime Especially in platform engineering: - How does garbage collection impact latency? - What happens under high concurrency? - How do network calls behave under failure? This is where Go shines but only if you understand it beyond syntax. 5. Operational Thinking As DevOps engineers, we don’t just write code, we run it. - Observability - Failure modes - Cost implications - Deployment patterns AI can write code. It cannot own production (yet). The Real Answer Don’t optimize for language choice. Optimize for engineering depth. In a world where AI writes code: - Syntax is cheap - Judgment is expensive The engineers who will stand out are the ones who can: - Ask the right questions - Design the right systems - Validate and evolve solutions over time #DevOps #PlatformEngineering #SoftwareEngineering #CloudNative #Kubernetes #Golang #Python #GenerativeAI #LLM #AICoding #EngineeringLeadership #TechCareers #CareerGrowth #LearningToLearn #SystemDesign #CleanCode #EngineeringExcellence

1 Comment
Like Comment
To view or add a comment, sign in
Mba armand Romaric
2w Edited
Report this post
Stop Scripting, Start Prompting: 4 Ways to Automate Your DevOps Tasks with AI 🤖⚡ The "traditional" way of managing infrastructure is evolving. We used to spend hours writing boilerplate code, debugging regex for logs, and manually tuning monitors. Today, the most efficient engineers are using AI-driven tools to handle the "toil," allowing them to focus on high-level architecture and security. If you aren't using these AI tools in your workflow yet, you're leaving hours of productivity on the table. Here is how to automate the most common tasks: #1. Code & Script Generation * The Task: Writing Bash scripts, Python automation, or complex SQL queries. * The Tool: GitHub Copilot ai or Tabnine ai. * The Benefit: They don't just complete your code; they suggest entire functions based on your comments. I've seen it reduce scripting time by over 50%. #2. Infrastructure as Code (IaC) Optimization * The Task: Writing and auditing Terraform or Kubernetes manifests. * The Tool: Pulumi AI or Snyk (AI-driven fixes). * The Benefit: #Pulumi AI allows you to describe your infrastructure in plain English and generates the code for you. Meanwhile, #Snyk uses AI to find and automatically suggest fixes for security misconfigurations in your templates. #3. Intelligent Monitoring & Anomaly Detection * The Task: Filtering out the "noise" in your alerts and finding root causes. * The Tool: Datadog (Watchdog) or Dynatrace (Davis AI). * The Benefit: These tools use "AIOps" to automatically surface anomalies. Instead of setting 100 manual thresholds, the AI learns what "normal" looks like and only alerts you when it detects a genuine deviation. #4. Technical Documentation & Troubleshooting * The Task: Explaining complex architectures or deciphering cryptic error logs. * The Tool: Claude (Anthropic) or ChatGPT (OpenAI). * The Benefit: Paste a 500-line log file or a complex JSON output into these models. They are incredible at summarizing errors, suggesting fixes, and even drafting the README documentation for your project in seconds. AI isn't here to replace the Engineer; it's here to replace the boring parts of Engineering. Devops Easy Learning Training Institute Which of these tools has made the biggest impact on your daily routine? Are there any hidden gems I missed? Let’s swap toolkits in the comments! 👇 #DevOps #Automation #AIOps #GitHubCopilot #CloudComputing #TechTools #SRE #InfrastructureAsCode #Efficiency #AI
Like Comment
To view or add a comment, sign in

263 followers

77 Posts

View Profile Follow

Python System Monitoring Script for DevOps

More Relevant Posts

Explore content categories