How to Teach Your Dog Web Stack Debugging
Sometimes debugging servers can be ruff.

How to Teach Your Dog Web Stack Debugging

Introduction

As I'm learning more about servers and web stack debugging, I recognize the importance of documenting errors (and their fixes) to prevent problems or at least make them easier to solve if a similar problem arises. This practice of documenting what went wrong, why, and how it can be fixed is often called a postmortem in IT. Postmortems can help demonstrate the impact of the outage and ensures the problem was not only fixed, but the root cause was found so it doesn't happen again.

If you've read some of my articles before, you know I think it's important to break things down into easy to understand concepts everyone can relate to, just like one of my favorite Physics authors, Chad Orzel. So without further ado, here is my postmortem for a web stack bug any human or dog can understand!

No alt text provided for this image
Issue Summary

At approximately 7:30 pm (CST), the web server for the local pet shop, Spot's Spot, went down due to a security risk from running the Nginx web server as the root user. The root user is a privileged user, like an alpha dog. If attackers are able to access the web server and steal the root user privileges, they can threaten to the whole pack. Instead of root, the Nginx web server should be run by a less-privileged user, such as nginx. This threat allowed a rival system to run and meant Nginx could not listen on port 8080, which is typically used for personally hosted web servers. The attack was like the neighbors dog suddenly having free reign over your yard and you're stuck on the sidelines without any pets.

The outage lasted for approximately 30 minutes, during which time, no pet owners could buy squeaky toys or bacon-flavored treats for their pups. 100% of users had to wait while the Spot's Spot site was down or had to turn to larger pet store chains to purchase their dog supplies.

Root Cause: The Nginx web server was being run by root, was replaced by another service and could no longer listen on post 8080.

Timeline
  • 7:31 PM (CST): Spot's Spot site goes down and web server engineers are notified when a customer calls the store after trying to access the website.
No alt text provided for this image
  • 7:37 PM: Web stack engineers check the currently running processes with ps auxff and find the Nginx web server is not running. They further investigate by running netstat -ldpn which gives them information about the processes running and their ports.
No alt text provided for this image
  • 7:39 PM: Web stack engineers find apache2 is running on port 8080. They kill the process with pkill apache2 and use netstat -ldpn to confirm nothing is currently listening on port 8080.
No alt text provided for this image
  • 7:41 PM: Engineers check the /etc/nginx directory to examine Nginx-specific files. They find the nginx.conf configuration file is owned by the user and group root, instead of nginx like all other nginx files, and does not contain any read, write, or execute privileges.
No alt text provided for this image
  • 7:44 PM: Engineers change the owner and group to nginx with chown nginx:nginx nginx.conf and grant the user read, write, and execute privileges with chmod u+rwx nginx.conf.
No alt text provided for this image
  • 7:46 PM: Engineers check that Nginx is able to run by restarting the service using sudo service nginx restart. Nginx briefly is running, but the site is listening on port 80 instead of port 8080.
No alt text provided for this image
  • 7:52 PM: Engineers check through common nginx files and find in /etc/nginx/sites-available/default that the server is default listening on port 80. This is changed to port 8080 and the file is saved.
No alt text provided for this image
  • 7:53 PM: After restarting the Nginx service, engineers find nginx is listening on port 8080. One problem is fixed.
No alt text provided for this image
  • 7:54 PM: However, when engineers check the running systems again with ps auxff, they find nginx is still running as root, which leaves the server more vulnerable to attack.
No alt text provided for this image
  • 7:56 PM: Engineers stop running the service with sudo, or the root user. They then start the service with service nginx start after switching to the nginx user with su nginx.
  • 7:57 PM: Nginx is running under the user nginx and is listening on port 8080. The website for Spot's Spot is back online and customers can access the site.
No alt text provided for this image
Root Cause and Resolution

The root cause came from running the Nginx web server as the root user. This left the server vulnerable to attack and allowed a rival process, apache2, to take over (probably the doing of cats). This attack also removed nginx user privileges and changed the default port the server listens on.

To stop the issue, web stack engineers first had to stop the apache2 process, just like dogs have to bark to shoo pesky squirrels out of their backyard. This allowed the system to be ready for the nginx service to start, but before doing so, engineers knew the server had been vulnerable and checked on the Nginx configuration files in /etc/nginx. This allowed them to see the nginx.conf file was owned by the user and group root, instead of nginx, and lacked all privileges. Engineers were able to restore nginx.conf's ownership to nginx and restore privileges so the user can read, write, and execute the configurations to run, like a dog getting their bone back. After restarting, engineers were able to see the process running on port 80. Upon investigation, they found the default port in the file /etc/nginx/site-available/default and changed to listen on port 8080. Finally, they were able to stop the service running under root and start the process again as the nginx user.

Corrective and Preventative Measures

Just like dogs learning new tricks, engineers can learn new practices to fix web stacks faster. Here are some tasks to avoid this problem or find fixes easier in the future:

  • run Nginx with a less-privileged user
  • check default settings in files
  • add monitoring system so Spot's Spot doesn't have to rely on loyal customers next time the site goes down

All dog pictures come from Shutterstock images or Amazon's 404 page.

To view or add a comment, sign in

More articles by Kelsie Merchant

Others also viewed

Explore content categories