"What k*lls the creative force is not age or a lack of talent, but our own spirit, our own altitude" - Robert Greene Good day, everyone😊 Day 9 of my #100DaysOfDevOps challenge was a reminder that in production, chaos is the only constant. But chaos is also where a DevOps engineer finds their value. The scenario: The Nautilus Application in the Stratos DC was hemorrhaging. Production support flagged a critical connection failure, the application couldn’t talk to the database. When the heartbeat of your app stops, every second counts. Here is how I resolved this problem, moving from a blind error to a surgical fix. First let me clear this out, we are troubleshooting the mariadb connection issue. So here's how I did it I always implement research Over reflex. Before touching the terminal or using commands, I went to the documentation. It’s very tempting to start throwing commands at a broken server, but I’ve learned that expertise is built on structured troubleshooting. I familiarized myself with MariaDB’s common failure modes before jumping into the heat, as well as possible solutions. So to solve this task, I first accessed the database server (`stdb01`) using the ssh command to leave the jump host. When that was done I tried to start it once more using: 'sudo systemctl start mariadb' The result? A failure. Then I used 'systemctl status' which showed Exit status: 1. This is the crossroads where most people get frustrated, but it’s where the real work begins. If the service won't start, the logs are the only map you have. I went straight to the MariaDB error logs using 'cat /var/log/mariadb/mariadb.log | tail -30' The log read as follows: Can't create/write to file '/run/mariadb/mariadb.pid' (Errcode: 13 "Permission denied") In the Linux, permissions are the silent killers of uptime. The /run/mariadb directory, essential for holding the Process ID (PID) file had lost its way. I had to re-assert control using the commands: 'chown -R mysql:mysql /run/mariadb`' (Returning the keys to the 'mysql' user). 'chmod 755 /run/mariadb' (Ensuring the owner can write while keeping the system secure). When that was done I used `systemctl start`, then checked the status to see if the issue is resolved. And boom success, mariadb was active and running which signaled my task is done. If you can't read the logs, you're just guessing. If you can't manage permissions, you're just a visitor in your own environment. Though of course there are many ways to resolve this, and this was one of them. #DevOps #100DaysOfDevOp #TechCommunity #LearningInPublic
Resolving Mariadb Connection Issue in Production
More Relevant Posts
-
Hi all LinkedIn! I want to share with you some tests I've been working on.🚀 As DevOps Engineers, we know that backing up a cluster, a specific namespace, or Persistent Volume Claims (PVCs) isn't just a "nice-to-have" it is mission critical. But the real challenge isn't just taking the backup, it’s ensuring portability and monitoring. Standard disk snapshots often lock you into a specific cloud provider. If you need to restore data from AWS (EKS) to Azure (AKS) or an On-Premise environment, vendor-specific snapshots can become a major roadblock. I recently lab-tested a solution using Velero integrated with Kopia to solve this exact problem. 🛠️ Why Velero + Kopia? While Velero handles the Kubernetes resource metadata, Kopia provides a fast, secure, and encrypted way to back up data at the file-system level. This makes your data backups truly portable across any provider because you are no longer dependent on cloud-native disk snapshots. The Lab Environment: To prove this architecture, I built a dedicated hands-on lab featuring: ✅ Velero as the core backup and restore engine. ✅ Sample Web App with a PostgreSQL database. ✅ NFS Server to simulate real-world shared storage scenarios. ✅ Minio as the S3-compatible backend for metadata and backup storage. ✅ Monitoring stack to track the state and health of every backup job. Want to see how to achieve seamless K8s disaster recovery? Check out the full repository and test results here: 🔗 Project Link: https://lnkd.in/eMGVtbp7 #Kubernetes #DevOps #Velero #Kopia #CloudNative #DataProtection #PlatformEngineering #SRE #OpenSource #NFS #Backup #Minio #s3 #Grafana #Monitoring
To view or add a comment, sign in
-
Lately, I’ve been facing an interesting confusion while setting up databases for development environments 🤔 When working with databases like MongoDB (especially with replica sets) or PostgreSQL, the question always comes up: 👉 Should I rely on local installations, or go fully with Dockerized setups? For example, configuring a MongoDB replica set locally is manageable, but when bringing in Docker, things like networking, container IPs, and replica initialization can get tricky. On the other hand, Docker provides consistency, portability, and a clean environment across teams. Similarly, for PostgreSQL, local setup is straightforward—but Docker makes version control and environment isolation much cleaner. So the trade-off becomes: * ⚡ Local setup → Faster, simpler for quick development * 🐳 Dockerized setup → More scalable, consistent, production-like Still exploring the best balance depending on project needs and team workflow. Curious to hear from others — what do you prefer for development environments? Local DB or Dockerized DB? And how do you handle things like MongoDB replica sets in Docker efficiently? #Docker #MongoDB #PostgreSQL #BackendDevelopment #DevOps #SoftwareEngineering
To view or add a comment, sign in
-
-
I just dealt with every DevOps engineer's nightmare: accidentally deleted a Kubernetes deployment AND its PVC. Guess what? The data wasn't actually gone. Turns out, most people don't realize that your Postgres data can survive PVC deletion if you set up your PersistentVolume with the right reclaim policy. I didn't know this either until it happened to me at 2 AM and I had to figure it out fast. I wrote up exactly what I learned — how to recover the data, but more importantly, the setup mistakes that cost people their databases: * Why Retain reclaim policy matters (seriously, use it for production) * The painful lesson about Postgres 18+ and mount paths * How to actually bind your PVC to the right PV instead of guessing * Full working manifests you can copy/paste If you're running stateful stuff in Kubernetes, read this before you accidentally delete something important. Trust me on this one. https://lnkd.in/gzKMKiVC Have you had a close call with Kubernetes storage? Would love to hear your story. #kubernetes #DevOps #PostgreSQL #Recovery
To view or add a comment, sign in
-
🚀 𝗣𝗿𝗼𝗱𝘂𝗰𝘁𝗶𝗼𝗻 𝗗𝗲𝗽𝗹𝗼𝘆𝗺𝗲𝗻𝘁 𝗥𝗲𝗮𝗹𝗶𝘁𝘆 – 𝗜𝗻𝗳𝗿𝗮 𝗨𝗽𝗴𝗿𝗮𝗱𝗲 𝗦𝘁𝗼𝗿𝘆 As a Software Engineer, I collaborated with my teammate Suraj Bhanarkar and the DevOps team on a production deployment involving major infrastructure changes. Production deployment is never just a “deploy and done” task. It requires coordination, real-time decision-making, and the ability to handle the unexpected. 𝗪𝗵𝗮𝘁 𝘄𝗲 𝗱𝗲𝗽𝗹𝗼𝘆𝗲𝗱: ✔ RabbitMQ upgrade to a 𝟯-𝗻𝗼𝗱𝗲 𝗰𝗹𝘂𝘀𝘁𝗲𝗿 (𝗤𝘂𝗼𝗿𝘂𝗺 𝘀𝗲𝘁𝘂𝗽) ⚠ Redis migration from 𝗦𝗶𝗻𝗴𝗹𝗲 → 𝗖𝗹𝘂𝘀𝘁𝗲𝗿 ❌ No downtime → live traffic (~30–40 QPS) continued 𝗥𝗲𝗮𝗹 𝗖𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗲: Existing Redis keys were stored in the single-node setup and were still required. We attempted to sync data to the Redis cluster, but: ⚠ Keys were copied ⚠ Not deleted properly ⚠ Causing inconsistencies 𝗪𝗵𝗮𝘁 𝘄𝗼𝗿𝗸𝗲𝗱: 👉 Ran both Redis systems in parallel 👉 Implemented fallback logic for key handling 👉 Gradually stabilized the system without downtime 💡 𝗞𝗲𝘆 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴𝘀: • Infra changes are simple in theory, complex in production • Data migration needs 𝘃𝗮𝗹𝗶𝗱𝗮𝘁𝗶𝗼𝗻, not just execution • Fallback mechanisms are critical in live systems • Small oversights can delay major deployments 💬 “𝗗𝗲𝗽𝗹𝗼𝘆𝗺𝗲𝗻𝘁 𝘀𝘂𝗰𝗰𝗲𝘀𝘀 𝗶𝘀 𝗻𝗼𝘁 𝗮𝗯𝗼𝘂𝘁 𝘇𝗲𝗿𝗼 𝗶𝘀𝘀𝘂𝗲𝘀 — 𝗶𝘁’𝘀 𝗮𝗯𝗼𝘂𝘁 𝗵𝗼𝘄 𝗾𝘂𝗶𝗰𝗸𝗹𝘆 𝘆𝗼𝘂 𝗵𝗮𝗻𝗱𝗹𝗲 𝘁𝗵𝗲𝗺.” 👉 Curious to know — how do you handle Redis migration in live systems? #DevOps #ProductionDeployment #RabbitMQ #Redis #NodeJS #SoftwareEngineering
To view or add a comment, sign in
-
-
Day 32 #90DaysOfDevops— I deleted a Docker container and lost everything. 😶 Here's what happened: I spun up a MySQL container, created a database, added real data, felt great about it. Then I stopped the container. Deleted it. Ran a brand new one. Typed → SHOW DATABASES; Gone. All of it. 💀 That was my "aha moment" with Docker Volumes. Containers are temporary. Data doesn't have to be. Here's what I learned on Day 32 👇 🔹 No Volume = data dies with the container 🔹 Named Volumes = Docker manages your data OUTSIDE the container lifecycle. Delete the container 10 times — data survives every time ✅ 🔹 Bind Mounts = edit a file on your laptop, container serves it instantly. No rebuild. No restart. Magic 🔥 I also broke down Docker Networking today: 🔹 Default Bridge = containers can talk by IP only 🔹 Custom Bridge = containers talk by NAME (Docker has a built-in DNS!) 🔹 Ping by name failed on default bridge. Same ping worked instantly on custom network. The final task? Connected a MySQL container + app container using a custom network and named volume — data persisted, DNS resolved, everything just worked. This is the stuff that makes Docker actually make sense. 🔗 Full notes & commands on my GitHub → https://lnkd.in/gP7hgAcZ If you're learning Docker or DevOps, save this post — you WILL hit this data loss moment someday 🙂 #Docker #DevOps #AWS #Linux #90DaysOfDevOps #CloudComputing #DevOpsJourney #TrainWithShubham
To view or add a comment, sign in
-
Hello all, I am excited to announce I have successfully completed a project that I have been working on. BackDB, a tool I developed to solve one of the most frustrating parts of devops: reliable, cross-engine database backups without the overhead of complex infrastructure. Most backup tools are either too simple or too complex. BackDB finds the “Goldilocks” zone: ✨Key Features: 1. The PostgreSQL Version Headache🤯 PostgreSQL’s `pg_dump` is notorious for failing if the client version is older than the server. BackDB solves this by bundling four versions (13, 14, 15, and 16) in a single tool You simply toggle the version you need, and the tool handles the binary pathing automatically. 2. “Raw Docker” Simplicity😎 Many modern tools force you into complex `docker-compose` setups or Kubernetes configurations. BackDB was developed with a ”Deploy in 60 Seconds” philosophy — just a single `docker run` command with two volume mappings, and you’re in production. You can also run it raw in you local machine as well with only a few commands. 3. Cross-Platform MSSQL Backups😌 Backing up SQL Server from Linux is often a nightmare. BackDB includes the official Linux `sqlpackage`, logic to reliably back up Windows-hosted SQL Servers from a Linux host. 4. Quick Backup and Scheduled Backup🤓 Need a backup urgently, just set the app up in the jumpserver/backup server or whichever you usually take the backup, just fill the details and click “Execute Quick Backup” and voila! backup is done. Ditch the cron scripts — automate backups the right way. With BackDB, scheduled backups are built-in, making it effortless to set up and manage job Please checkout my medium blog for the detailed explanation: https://lnkd.in/gTVfDpki
To view or add a comment, sign in
-
-
🚀 Setting Up a High-Availability PostgreSQL Cluster with Patroni and etcd In the world of databases, high availability is key to avoiding downtime and ensuring business continuity. Recently, I explored a detailed guide on how to implement a fault-tolerant PostgreSQL cluster using Patroni and etcd. This setup enables automatic replication, intelligent failover, and real-time monitoring, ideal for scalable production environments. 🔧 Main Steps for Implementation - 📦 Dependency Installation: Start by setting up etcd as a distributed store for cluster coordination. Install PostgreSQL, Patroni, and the necessary tools on Ubuntu nodes or similar, ensuring version compatibility. - ⚙️ Patroni Configuration: Edit the Patroni configuration file to define the cluster, including parameters like the cluster name, WAL size, and connection to etcd. Enable streaming replication for continuous synchronization between primary and replica nodes. - 🖥️ Cluster Initialization: Use the patroni command to bootstrap the first node, then join the secondary nodes. Verify the status with tools like pg_is_in_recovery and etcdctl to confirm the cluster's health. - 🛡️ Failover Testing: Simulate failures by disconnecting the primary node and observe how Patroni automatically promotes a replica. Monitor logs and metrics to optimize recovery time, which can be under 30 seconds in well-tuned setups. This approach not only improves resilience but also integrates easily with tools like HAProxy for load balancing. It's a robust solution for DevOps teams seeking simplicity without sacrificing performance. For more information visit: https://enigmasecurity.cl #PostgreSQL #HighAvailability #Patroni #Etcd #DevOps #Databases #CloudComputing If you liked this summary, consider donating to the Enigma Security community to keep supporting with more news: https://lnkd.in/er_qUAQh Connect with me on LinkedIn to discuss more about cybersecurity and tech: https://lnkd.in/eXXHi_Rr 📅 Mon, 06 Apr 2026 13:01:41 GMT 🔗Subscribe to the Membership: https://lnkd.in/eh_rNRyt
To view or add a comment, sign in
-
-
𝗣𝗼𝘀𝘁𝗴𝗿𝗲𝗦𝗤𝗟 𝘁𝘂𝗿𝗻𝗲𝗱 𝗱𝗮𝘁𝗮𝗯𝗮𝘀𝗲𝘀 𝗶𝗻𝘁𝗼 𝗮 𝗿𝗲𝗹𝗶𝗮𝗯𝗶𝗹𝗶𝘁𝘆 𝗳𝗼𝘂𝗻𝗱𝗮𝘁𝗶𝗼𝗻 At PostgreSQL Global Development Group, a database isn’t just storage. It’s the backbone of your application. That changes how systems are built. Without a strong database foundation: • queries slow down as data grows • consistency becomes a challenge • scaling introduces risk With PostgreSQL, teams get 𝗿𝗲𝗹𝗶𝗮𝗯𝗶𝗹𝗶𝘁𝘆, 𝘀𝘁𝗿𝗼𝗻𝗴 𝗰𝗼𝗻𝘀𝗶𝘀𝘁𝗲𝗻𝗰𝘆, 𝗮𝗻𝗱 𝗽𝗼𝘄𝗲𝗿𝗳𝘂𝗹 𝗾𝘂𝗲𝗿𝘆𝗶𝗻𝗴 𝗰𝗮𝗽𝗮𝗯𝗶𝗹𝗶𝘁𝗶𝗲𝘀. The DevOps lesson: 𝗬𝗼𝘂𝗿 𝗱𝗮𝘁𝗮𝗯𝗮𝘀𝗲 𝗶𝘀𝗻’𝘁 𝗷𝘂𝘀𝘁 𝗮 𝗰𝗼𝗺𝗽𝗼𝗻𝗲𝗻𝘁. 𝗜𝘁’𝘀 𝘁𝗵𝗲 𝗳𝗼𝘂𝗻𝗱𝗮𝘁𝗶𝗼𝗻. If your foundation is weak, everything built on top will feel it. At ServerScribe, we help teams design data layers that are stable, callable, and production-ready. Is your database built for growth — or just for today? 👇 #DevOps #ServerScribe #PostgreSQL #Databases #Reliability #SRE #BackendEngineering
To view or add a comment, sign in
-
🚀 𝗕𝗶𝗴 𝗪𝗶𝗻: 𝗜𝗺𝗽𝗹𝗲𝗺𝗲𝗻𝘁𝗶𝗻𝗴 𝗮 𝟮-𝗟𝗲𝘃𝗲𝗹 𝗕𝗮𝗰𝗸𝘂𝗽 𝗦𝘆𝘀𝘁𝗲𝗺 𝗳𝗼𝗿 𝗔𝗹𝗹 𝗗𝗮𝘁𝗮𝗯𝗮𝘀𝗲𝘀 🚀 In our DevOps setup, we were managing multiple critical databases—𝙈𝙤𝙣𝙜𝙤𝘿𝘽, 𝙈𝙮𝙎𝙌𝙇, 𝙄𝙣𝙛𝙡𝙪𝙭𝘿𝘽, 𝙖𝙣𝙙 𝘾𝙡𝙞𝙘𝙠𝙃𝙤𝙪𝙨𝙚—𝙗𝙪𝙩 𝙗𝙖𝙘𝙠𝙪𝙥𝙨 𝙬𝙚𝙧𝙚 𝙛𝙧𝙖𝙜𝙢𝙚𝙣𝙩𝙚𝙙, 𝙞𝙣𝙘𝙤𝙣𝙨𝙞𝙨𝙩𝙚𝙣𝙩, 𝙖𝙣𝙙 𝙨𝙘𝙖𝙩𝙩𝙚𝙧𝙚𝙙 𝙖𝙘𝙧𝙤𝙨𝙨 𝙙𝙞𝙛𝙛𝙚𝙧𝙚𝙣𝙩 𝙨𝙮𝙨𝙩𝙚𝙢𝙨. In a real incident scenario, this meant we didn’t have a single, reliable way to confidently restore data quickly. The risk of partial backups, missed schedules, or slow recovery was real—and that’s a problem no production system can afford. To solve this, I worked on standardizing and strengthening our entire backup strategy. I successfully implemented a unified 2-level backup system: • 𝗟𝗲𝘃𝗲𝗹 𝟭: Fast, automated local backups for quick recovery • 𝗟𝗲𝘃𝗲𝗹 𝟮: Secure external backups stored in MinIO (S3-compatible) for disaster recovery 𝗡𝗼𝘄 𝗲𝘃𝗲𝗿𝘆𝘁𝗵𝗶𝗻𝗴 𝗳𝗼𝗹𝗹𝗼𝘄𝘀 𝗮 𝗰𝗹𝗲𝗮𝗻, 𝗰𝗼𝗻𝘀𝗶𝘀𝘁𝗲𝗻𝘁 𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲: 𝗯𝗮𝗰𝗸𝘂𝗽/{𝘁𝗲𝗰𝗵𝗻𝗼𝗹𝗼𝗴𝘆}/{𝗵𝗼𝘀𝘁}/{𝗱𝗮𝘁𝗮𝗯𝗮𝘀𝗲}/ I also built a robust MongoDB backup and disaster recovery flow for our wpms-core and wpms-bundle test environments: • Quick local restores for immediate recovery needs • External S3/MinIO backups for long-term safety • Manual backup option for full control when needed The result is a reliable, scalable backup system that removes uncertainty and ensures we can recover any database anytime with confidence. #DevOps #BackupSuccess #DataSecurity #MongoDB #MySQL #MinIO #Automation #Kubernetes #DataProtection #DisasterRecovery
To view or add a comment, sign in
-
-
A backup that was never restored is just a theory. Over the past few weeks, I wrote a 3-part deep dive on building a production-ready Docker PostgreSQL backup and restoration strategy — focused on zero data loss, safe recovery, and automation. The series focuses on practical production concerns: • Preventing silent data loss • Restoring databases safely under pressure • Building repeatable recovery workflows • Automating backups responsibly Not a tutorial — more of a real-world operations playbook. Part 1 — Backup Architecture https://lnkd.in/gCC9bm48 Part 2 — Safe Restoration & Zero Data Loss https://lnkd.in/gnvADbkw Part 3 — Backup Automation https://lnkd.in/g_bq4Mge Would appreciate thoughts from engineers running PostgreSQL in production. #PostgreSQL #DevOps #SRE #PlatformEngineering #CloudNative
To view or add a comment, sign in
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development