Ansible Automates On-Premise HA Clusters

From a mess of Bash Scripts to full automation Working in systems, when you start scaling up, manually configuring or using patchy bash scripts for on-premise HA clusters is honestly exhausting. There were times when running a deploy command left me holding my breath, worrying the nodes might drift out of sync. Recently, during a review, I decided to scrap all the old scripts and switch entirely to Ansible for IaC. Not to chase a trend, but because it solves exactly three things I need: 1. No messy installations (Agentless) When dealing with physical servers, resources must be strictly optimized. I have a strong aversion to installing extra background agents on machines. With Ansible, all you need is SSH. You push commands from one control machine down to hundreds of nodes at once. Once it's done, it's done. It doesn't clutter the system or waste a single megabyte of RAM on the target servers. 2. Run repeatedly without fear of errors (Idempotency) Running a bash script over and over easily throws errors. Ansible is different; it works by letting you declare the outcome you want. For example, if you tell it to start Nginx, it checks—if Nginx is already running, it ignores the task; if not, it turns it on. This characteristic gives me the confidence to schedule automated scripts every day, ensuring the nodes are always in sync without worrying about crashing active services. 3. Everything you need is built-in (Batteries-Included) A real-world system is more than just a Linux OS. It involves HAProxy, Firewalls, Databases, and all sorts of things. The beauty of Ansible is its thousands of built-in modules that let you hook directly into those services. Everything is consolidated into a single workflow, so the team doesn't have to struggle with writing custom API calls from scratch. Looking back, this transition didn't just save time; it brought peace of mind. Tomorrow, if we need to move the server cluster to a new infrastructure, all it takes is typing one ansible-playbook command and everything automatically rebuilds exactly as it was. In engineering, sometimes the best technology is simply the one that helps us sleep better at night! What tool is your team using to automate your systems? Let's share. #SystemArchitecture #DevOps #Ansible #ITAutomation #OnPremise #InfrastructureAsCode #TechJourney

To view or add a comment, sign in

More Relevant Posts

Chinelo Ufondu
3w
Report this post
I spent a week battling an Ansible SSH timeout issue that nearly made me give up and go manual. Every time I ran my playbook to deploy Prometheus, Grafana, Node Exporter, and Alertmanager, it would connect successfully, pass Gathering Facts, then die mid-playbook with "Connection timed out." Port 22 was open. UFW was configured correctly. SSH was running. Everything looked very fine. I almost convinced myself to just do the whole thing manually. Then I decided to give it one more shot. The fix was three lines in ansible.cfg: ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o ServerAliveInterval=30 -o ServerAliveCountMax=5 pipelining = True The root cause was my cloud provider's NAT gateway silently dropping idle TCP connections between Ansible tasks. SSH would connect, go quiet for a few seconds between tasks, and the NAT would kill it without any warning. ServerAliveInterval=30 keeps the connection alive by sending keepalive packets. ControlPersist=60s reuses the same SSH connection across tasks instead of reopening it. Pipelining reduces the number of SSH operations per task. Final result: ok=27, changed=22, unreachable=0, failed=0. A week of frustration solved by three lines. If you are running Ansible on cloud VMs and hitting random SSH timeouts mid-playbook, check your NAT keepalive settings before anything else. It might save you a lot of time. #Ansible #DevOps #Linux #Prometheus #Grafana #Monitoring #SRE #CloudInfrastructure #TheEmpatheticEngineer
Like Comment
To view or add a comment, sign in
Great Oronaa
3w
Report this post
I stopped configuring servers manually and automated the whole thing with Ansible. One of the biggest advantages of Infrastructure as Code is being able to configure multiple servers consistently, repeatedly, and without manual drift. In this lab, I used Ansible on WSL to provision and configure two AWS Ubuntu servers with different web stacks: Nginx server with basic authentication Apache server serving a custom HTML page Wireshark installed on both servers for network tooling A main Ansible playbook to orchestrate the entire deployment Instead of configuring each server manually, I automated the entire workflow using Ansible playbooks. What I implemented I created a structured Ansible project with: Inventory file for host targeting Separate playbooks for: Nginx Apache Wireshark A main.yml file to run everything in sequence Templates for reusable configuration Custom HTML deployment to both web servers Key tasks completed Verified SSH connectivity with Ansible Installed and configured Nginx Enabled basic authentication on Nginx Installed and configured Apache2 Deployed custom web content Installed Wireshark on both instances Orchestrated the full deployment with a master playbook Real issues I ran into (and fixed) This project was not just “run and done” — I had to troubleshoot real automation issues along the way: Invalid playbook variable placement Missing Python library dependency (passlib) Incorrect use of import_playbook inside tasks YAML formatting / indentation errors Fixing those issues reinforced something important: In DevOps, writing automation is one thing. Writing automation that is repeatable, reliable, and debuggable is the real skill. What this project reinforced for me This lab helped me strengthen practical skills in: Configuration management Infrastructure automation Ansible playbook structure Server provisioning Troubleshooting deployment failures Reducing manual setup across environments The goal is always the same: Less manual work. More consistency. Better repeatability. That’s the kind of workflow I’m building toward. Link to project Repo:https://lnkd.in/esqwUdzs #Ansible #DevOps #AWS #Automation #InfrastructureAsCode #Linux #Nginx #Apache #CloudEngineering #SystemAdministration #ConfigurationManagement The Pistis Tech Hub
1 Comment
Like Comment
To view or add a comment, sign in
Vishal Gore
1w
Report this post
🚀 Challenge #100DaysOfDevOps by KodeKloud | Day 12 Today’s challenge was all about troubleshooting Apache service issues in a real-world scenario — and it turned out to be a great learning experience. 🔍 What I worked on: I was given a situation where the Apache service was not reachable on port 5001. Instead of jumping to conclusions, I followed a structured debugging approach. 🛠️ Steps I took: I checked the Apache service status and found it was failing to start. I analyzed the logs and discovered a port conflict issue. Using ss, I identified that sendmail was already using port 5001. I stopped and disabled the sendmail service to free the port. After that, I successfully started Apache and confirmed it was running. Next, I verified that Apache was listening on all interfaces. While testing from the jump host, I faced a network issue (No route to host). I investigated iptables and found restrictive rules blocking traffic. Finally, I allowed port 5001 in iptables and validated the fix using curl. 💡 Key Learnings: Always read error logs carefully — they often point directly to the issue. Port conflicts are a common but critical problem in server setups. Troubleshooting is not just about services, but also networking and firewall rules. A step-by-step approach saves time and avoids confusion. ✅ Outcome: Apache is now up and accessible on port 5001 from the jump host. Every day in this challenge is making me more confident in handling real DevOps scenarios. Looking forward to the next one! 🔥 If you're starting your DevOps journey, I highly recommend KodeKloud for hands-on labs 👇 https://lnkd.in/deg5ZDcV #DevOps #Linux #Apache #Troubleshooting #Networking #LearningJourney #KodeKloud
1 Comment
Like Comment
To view or add a comment, sign in
George Searcher
5d
Report this post
Building a High Availability cluster on Linux? Then you’ve likely run into the ARP Problem—where multiple servers try to claim the same Virtual IP (VIP), causing traffic chaos. 😵💫 This technical breakdown for DevOps and Network Engineer pros explains how to automate the fix using Ansible. 🛠️ 1️⃣ The ARP race is real 🏁 In a Layer 4 Direct Server Return (DSR) setup, the VIP sits on the loopback interface of every real server. Without the right configuration, these servers will fight to answer ARP requests meant for the load balancer, leading to intermittent connection drops and flapping. 2️⃣ Don't just patch—automate with Ansible 🤖 Manually editing sysctl.conf across a 20-node cluster is a recipe for human error. By using the ansible.posix.sysctl module, you can ensure arp_ignore and arp_announce settings are applied consistently and persist across reboots. 3️⃣ The winning config ✅ To silence the loopback and let the load balancer do its job, you need two specific settings on your backend servers: • net.ipv4.conf.all.arp_ignore = 1 (Only reply if the target IP is on the incoming interface) • net.ipv4.conf.all.arp_announce = 2 (Use the best local address for the target) If you're tired of manual network troubleshooting, this Ansible approach is a game changer for cluster stability. Check out the full guide and the playbook code here: https://bit.ly/4cPFnkm #Linux #Ansible #SysAdmin #LoadBalancing #DevOps #Networking #Automation #HighAvailability
Like Comment
To view or add a comment, sign in
Kevin Anderson
3w
Report this post
https://lnkd.in/eWe6wW9d Proud to share my latest project: an Automated Hardened Server Pipeline. 🚀 I transformed a manual system administration task into a repeatable Infrastructure as Code (IaC) workflow using Packer and Ansible. Key Features: 🛡️ Automated security hardening (UFW, Sudo auditing, PAM). 💾 Complex LVM partitioning via Preseed. 🔄 Zero-downtime SSH port migration. Check out the documentation and source code below! #DevOps #IaC #Linux #Ansible

GitHub - kevshouse/born2broot-devops github.com
Like Comment
To view or add a comment, sign in
Mohak Deep Singh
1w
Report this post
Jenkins vs GitHub Actions in 2026 ⚔️ Everyone compares this the wrong way. ❌ “Jenkins has 1,800+ plugins” ❌ “GitHub Actions has 20,000+ marketplace actions” That’s NOT what actually matters. Here’s what matters when your team is deciding 👇 ⚙️ SETUP TIME Jenkins: • Provision server • Install Java • Configure master + agents • Manage plugin compatibility 👉 Minimum half a day (realistically more) GitHub Actions: • Create a .yml file 👉 You’re live in ~15 minutes 💸 REAL COST Jenkins: • $200–500/month infra (AWS/GCP) for a ~10 dev team • + 2–4 hours/month maintenance GitHub Actions: • 2,000 free minutes/month • Most small teams stay within free tier 👉 Hidden cost of Jenkins = engineering time 🔐 SECURITY (THIS IS WHERE IT ENDS) GitHub Actions gives you out-of-the-box: • OIDC (keyless cloud authentication) • Encrypted secrets with environment scoping • Job-level token permissions Jenkins can do this… 👉 but it takes months to configure correctly 🏗️ WHEN JENKINS STILL WINS • Air-gapped / offline environments • Heavy investment in Groovy Shared Libraries • Non-GitHub SCMs (GitLab, Bitbucket) • Enterprise tools with Jenkins-only plugins 👉 In these cases — STAY on Jenkins 🚀 FOR EVERYONE ELSE The break-even is simple: 👉 1–3 months of saved maintenance time = migration cost recovered 📊 I wrote a full deep-dive with: • 12-row comparison table • Cost breakdown • Migration strategy Read here: https://lnkd.in/gsn8Uzjt Curious — what’s still keeping your team on Jenkins in 2026? #DevOps #CI #CD #Jenkins #GitHubActions #DevSecOps

Jenkins vs GitHub Actions 2026 - Complete Comparison | PipeShiftAI pipeshiftai.com

1 Comment
Like Comment
To view or add a comment, sign in
Vasa Rani Akshaya
5d Edited
Report this post
🚀 Ansible — Day 1 Managing 100 servers manually? Logging into each one… installing software… fixing issues… 😵 👉 There’s a better way. Ansible lets you manage all servers with just one command. 💡 What is Ansible? Ansible is a tool used to: 🔹 Install software 🔹 Configure servers 🔹 Deploy applications 🔹 Manage multiple systems at once 👉 Think of it like a remote control for your infrastructure 🔥 Why Ansible is popular? No agents needed — unlike Puppet/Chef (no setup on every server) Uses simple YAML — no complex programming required Connects using SSH — works with what you already have Push-based model — you control everything from one place 👉 Simple + powerful = DevOps favorite 🧠 How it works 💻 Your system (Control Node) → 📋 List of servers (Inventory) → 🖥 Target servers (Managed Nodes) 👉 Ansible connects → runs tasks → exits (No background process, no complexity) ⚡ Quick Commands 👉 Check all servers: ansible all -m ping 👉 Install Nginx: ansible webservers -m apt -a "name=nginx state=present" --become 👉 Check disk: ansible all -m shell -a "df -h" 🧩 Essential Modules to Know 👉 These are the building blocks of Ansible: ping → check if servers are reachable copy → send files to servers file → manage files, folders, permissions apt / yum → install or remove software service → start/stop/restart services command → run simple commands shell → run advanced commands (pipes, redirects) template → create dynamic config files 💡 Tip: Use command whenever possible (safer) Use shell only when needed (pipes |, redirects >) 🧪 Simple Playbook Example - name: Install Nginx hosts: webservers become: true # runs with sudo privileges tasks: - name: Install nginx apt: name: nginx state: present 👉 Run it: ansible-playbook install-nginx.yml 🎯 Why this is powerful Run once → installs everywhere ✅ Run again → no duplicate work ✅ 👉Idempotent: Run this playbook 10 times — Ansible only makes changes when something is actually different. Already installed? Skipped. Already running? Skipped. #Ansible #DevOps #Automation #Linux #CloudComputing #CareerGrowth #Configuration #AWS #DevSecOps #Tech #Public #TechLearning #CareerGrowth #LearningInPublic #LinkedInGrowth
Like Comment
To view or add a comment, sign in
Loadbalancer.org

3,304 followers
5d
Report this post
Building a High Availability cluster on Linux? Then you’ve likely run into the ARP Problem—where multiple servers try to claim the same Virtual IP (VIP), causing traffic chaos. 😵💫 This technical breakdown for DevOps and Network Engineer pros explains how to automate the fix using Ansible. 🛠️ 1️⃣ The ARP race is real 🏁 In a Layer 4 Direct Server Return (DSR) setup, the VIP sits on the loopback interface of every real server. Without the right configuration, these servers will fight to answer ARP requests meant for the load balancer, leading to intermittent connection drops and flapping. 2️⃣ Don't just patch—automate with Ansible 🤖 Manually editing sysctl.conf across a 20-node cluster is a recipe for human error. By using the ansible.posix.sysctl module, you can ensure arp_ignore and arp_announce settings are applied consistently and persist across reboots. 3️⃣ The winning config ✅ To silence the loopback and let the load balancer do its job, you need two specific settings on your backend servers: • net.ipv4.conf.all.arp_ignore = 1 (Only reply if the target IP is on the incoming interface) • net.ipv4.conf.all.arp_announce = 2 (Use the best local address for the target) If you're tired of manual network troubleshooting, this Ansible approach is a game changer for cluster stability. Check out the full guide and the playbook code here: https://bit.ly/4cPyZtI #Linux #Ansible #SysAdmin #LoadBalancing #DevOps #Networking #Automation #HighAvailability
1 Comment
Like Comment
To view or add a comment, sign in
Tensae Deme
6d
Report this post
🚀 Day 16/100 — DevOps Challenge Today I configured a high-availability setup by implementing a load balancer in a multi-tier architecture. 🛠️ What I did: Installed and configured Nginx on the load balancer server Set up an upstream block to distribute traffic across multiple app servers Ensured all backend servers were running Apache on the correct port Connected everything using reverse proxy (proxy_pass) Verified load balancing using curl requests 🔍 Challenge: Faced repeated 502 Bad Gateway errors. The root cause turned out to be a mismatch between the ports configured in Nginx and the actual ports used by Apache on the backend servers. 💡 Key Takeaways: A running service doesn’t mean it’s reachable—port alignment is critical Load balancers depend heavily on correct backend configuration Always validate backend connectivity before blaming the load balancer Debugging step-by-step is more effective than guessing 📐 Architecture Built: Client → Load Balancer (Nginx) → Multiple App Servers (Apache) This was a great hands-on exercise in troubleshooting real-world production issues and understanding how high availability systems work. #DevOps #Nginx #LoadBalancing #Linux #Networking #100DaysOfDevOps #KodeKloud
Like Comment
To view or add a comment, sign in
Vishal Gore
5d
Report this post
🚀 Challenge #100DaysOfDevOps by KodeKloud | Day 14 🔍 Day 14: Linux Process Troubleshooting Today’s lab was all about identifying and fixing a real-world service issue in a production-like environment. 🧠 What I worked on: I investigated an Apache service outage reported on one of the application servers in Stratos DC. The goal was to ensure Apache was running on all app servers and correctly configured on port 8085. ⚙️ Steps I followed: I connected to each application server using SSH from the jump host. I checked Apache service status using systemctl status httpd. I identified the faulty server where Apache was failing to start. I analyzed the error logs and found a port conflict issue. Using ss -tulnp, I discovered that another process (sendmail) was already using port 8085. I stopped and disabled the conflicting service (sendmail). I verified and updated Apache configuration to use port **8085`. I restarted Apache and confirmed it was running successfully. I repeated verification across all servers to ensure consistency. ❗ Issue I faced: Apache service was failing due to “Address already in use”, caused by another service occupying the required port. ✅ How I resolved it: I freed the port by stopping the conflicting process and ensured Apache was properly configured and running on port 8085 across all servers. 📌 Key Learnings: Always check for port conflicts when a service fails to start Use tools like ss or netstat to identify running processes Logs and error messages are the fastest way to diagnose issues Consistency across servers is critical in distributed environments 💡 This task gave me a strong understanding of Linux process management and real-time troubleshooting in DevOps environments. If you want to start your cloud and DevOps journey with KodeKloud (highly recommended if you want real hands-on learning): https://lnkd.in/deg5ZDcV #DevOps #Linux #Troubleshooting #Apache #KodeKloud #100DaysOfDevOps #LearningJourney
2 Comments
Like Comment
To view or add a comment, sign in

1,160 followers

8 Posts

View Profile Connect

Ansible Automates On-Premise HA Clusters

More Relevant Posts

Explore content categories