Postmortem

Issue Summary

The outage was reported on October 6th, 2021 at 16:25 pm (GMT-05:00) and was solved during that day at 17:00 pm (GMT-05:00), it lasted 35 minutes.

The Apache server hosting the Wordpress site was not responding and returned the 500 Internal Server Error. The root cause of this is that Apache contained some mislabeled files in our web server which cause it to crash.

The error affected almost 45% of our users, however the incident did not escalate to other areas.

Timeline

  • 16:25 pm (GMT-05:00)— The issue was detected via an email notifying us that our website was down.
  • 16:30 pm (GMT-05:00)— Confirmed the 500 error after curling the website. (curl -sI 127.0.0.1)
  • 16:31 pm (GMT-05:00) — Used ps -auxf to see the current running processes.It shows us our web server apache being run from root (PID -156) and www-data (PID -193).
  • 16:37 pm (GMT-05:00) — Used strace -p <PID> for the root (156). It returned a continuous timeout response.
  • 16:38 pm (GMT-05:00)— Opened other terminal of our server. Curled our localhost and got nothing.
  • 16:40 pm (GMT-05:00)— Used strace -p <PID> for www-data (193). This instead wait’s for a response to curl in another terminal.
  • 16:42 pm (GMT-05:00) — Curled localhost and got big error response in the terminal window where was running the program “strace”.
  • 16:48 pm (GMT-05:00) — Noticed some `.phpp` files with a -1 error returned after it. An example of a line of code : (lstat(“/var/www/html/wp-includes/class-wp-locale.phpp”, 0x7fff3fa0f610) = -1 ENOENT (No such file or directory)).
  • 16:49 pm (GMT-05:00) — Googled if .phpp exist but not, noticed that this was a typo error on some file.
  • 16:50 pm (GMT-05:00)—Went to the directory where everything for wordpress is located.
  • 16:52 pm (GMT-05:00) — In that directory search with grep,located a file called `wp-settings.php` which contained the word “.phpp”
  • 16:57 pm (GMT-05:00)— Wrote a puppet script which execute the command `sed -i ‘s/.phpp/.php/g’` on the file /var/www/html/wp-settings.php to change all the words “.phpp” for “.php” in that file.
  • 17:00 pm (GMT-05:00)— After developing and running the puppet script, curled our localhost (127.0.0.1) again and get a working response. Our issue was resolved.

Root cause and resolution

Root Cause

The cause of the issue was that their were some wordpress files with the wrong extension. They ended with the “.phpp” extension instead of the “.php” extension which was not allowing Apache (our web server) to be run.

Resolution

Upon finding of error, a manual fix on one server was first completed to ensure the fix would work. A puppet file was then created to distribute across all servers.

Corrective and preventative measures

Error could have been easily prevented from user who updated config file to test that the server was still functional before exiting.

Also, test new configuration files in separate containers instead of live servers in order to avoid these errors.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store