Gitlab Failure on Upgrade

Troubleshooting a botched Gitlab upgrade
August 18, 2017
gitlab

I upgraded Gitlab CE to v9.4.4 this morning, and as I watched the logs to verify that the upgrade was successful, I saw a long chain of these:

2017-08-18_12:44:48.05950 2017/08/18 12:44:48 error: GET "/": badgateway: failed after 0s: dial unix /var/opt/gitlab/gitlab-rails/sockets/gitlab.socket: connect: connection refused

A few seconds of these types of messages is normal, but this went on for more than a minute.

I logged in and did some basic troubleshooting, only to find that the site eventually came up.

After restarting the container to watch the logs for longer, I determined that after almost two minutes the logs changed to this:

2017-08-18_12:46:24.75640 2017/08/18 12:46:24 error: GET "/": badgateway: failed after 2s: context canceled

A few seconds later sidekiq logged that its startup was complete, and after that I started seeing 302s for the health check.

If you’re running Gitlab under Rancher, set your health check to have an initialization/reinitialization timeout of at least 180,000 ms (3 minutes). That’s a long time for an app to take to start, but Gitlab is…special.