Postmortem of the Third Kind

Postmortem of the Third Kind

Issue Summary

While testing the systemd startup script of a gunicorn application on a peer's virtual server, I entered shutdown now on the command line intending to shut the server down for a set amount of time before starting it back up. It turned out that the termination policy of the AWS virtual machine did not allow for a soft or hard reboot, and the server was terminated on Oct 11 at 11:05 PST and was restored by 11:30 PST. This resulted in additional work for the peer who had to focus on troubleshooting the issue instead of proceeding with the day's agenda. Also the school CTO received alerts and had to take time to resolve the issue. Although the server came back up, it was a new instance of the server image with a new IP. This resulted in additional work for the peer who had to update the Load Balancer configuration, JavaScript code, and DNS A Records for the new IP.


Timeline on Oct 11

11:05 PST shutdown now command entered.

11:06 PST attempted soft and hard reboot via school intranet receiving only this error:

No alt text provided for this image

11:09 PST slacked school CTO regarding issue, who responded that he was seeing alerts.

11:25 PST CTO uses admin panel on AWS to launch new instance of the Amazon Machine Instance

11:30 PST Server is back up and running in peer's charge.

Preventative Measures

No alt text provided for this image

It is important to understand the Termination Policy associated with the Virtual Machine in question before considering running the 'shutdown' command. In the case of Amazon Web Services, shutting down the server will result in the termination of the instance, which means it cannot be restored. However, the AWS policy allows to create a new instance from the Amazon Machine Image of the server in question. But this will be a new instance with a new IP so it is still a consequential action.

To view or add a comment, sign in

Explore content categories