GitLab deploys on a Friday and ... is down for a few hours

Trending 2 months ago

Updated GitLab, a hosted git work not dissimilar Microsoft's GitHub, was down for immoderate users arsenic of Friday morning, Pacific Time.

Around 1634 UTC (0934 PT), nan codification hosting work started returning 503 Service Unavailable errors to those attempting to entree nan website.

Software developers who dangle connected nan work were speedy to observe nan unexpected time off.

They besides took clip to mention sysadmin superstition astir not deploying connected a Friday. "GitLab seems to person deployed connected a Friday breaking their site," quipped UK-based dev Luke Warlow. "Which is annoying origin it's stopping maine deploying connected a Friday and breaking my site."

The issue page for nan IT breakdown itself returned an correction banner erstwhile loading: "An correction occurred while fetching nan incident status. Please reload nan page."

Nonetheless, nan page loaded to explicate that nan origin of nan downtime is presently described arsenic a "config change."

  • GitHub codification hunt redesign can't find galore fans
  • Microsoft's GitHub nether occurrence for DDoSing important unfastened root task website
  • Where are we now, Microsoft 362.5? Europe reports outages
  • With dead-time dump, Microsoft revealed DDoS arsenic origin of caller unreality outages

"The work is presently being restored, we're taking aggregate measures to person an contiguous reconstruct of nan service, arsenic agelong arsenic a targeted hole to nan guidelines cause," nan rumor page explains.

"More accusation will beryllium added arsenic we analyse nan issue. For customers believed to beryllium affected by this incident, please subscribe to this rumor aliases show our position page for further updates."

The effect is described arsenic a site-wide outage and immoderate customers, it's said, should expect their projects to beryllium unavailable "for a play of clip aft work is restored."

GitLab did not instantly respond to a petition for further information.

The GitLab position page appears to blasted Google Cloud, noting that nan affected location is "Google Compute Engine."

(The only glitch we tin spot connected Google Cloud is immoderate disruption astir nan world stemming from nan Google Kubernetes Engine, but that is conscionable a problem pinch "unexpected further messages successful GKE cluster logs" alternatively than unavailable systems. So we return GitLab's position page to mean that nan downtime was caused by thing wrong its GCE deployment.)

GitLab's position page lists nan pursuing GitLab services arsenic disrupted: Git Operations, Container Registry, GitLab Pages, CI/CD - GitLab SaaS Shared Runners, CI/CD - GitLab SaaS Private Runners, CI/CD - Windows Shared Runners (Beta), SAML SSO - GitLab SaaS, Background Processing, and Canary.

As of 1846 UTC (1146 PT), nan position page reported that nan rumor was still being investigated: "We person implemented a hole to mitigate Web/API services. Investigation is ongoing for different services."

At slightest nan incident does not look to beryllium arsenic terrible arsenic GitLab's 2017 nonaccomplishment of accumulation data, successful which an administrator deleted a directory connected nan incorrect server during a replication process, resulting successful nan nonaccomplishment of 300 GB of unrecorded accumulation data. ®

Updated to add

According to a postmortem report by GitLab, nan outage was caused successful portion by a alteration request, "an aged pipeline was triggered, applying an obsolete Terraform scheme to nan accumulation environment."

While you're here... We conscionable want to emblem up that nan Fedora Linux task is considering adding nan postulation of usage metrics – immoderate mightiness telephone it telemetry – to nan distribution from merchandise 40 connected an opt-out basis. The existent merchandise is 38. The task hasn't yet worked retired what metrics to collect, and says it is keen to sphere users' privacy. We're keeping an oculus connected it.