Explainer – Facebook’s historic outage: What happened and why restoration took so long

Share:

Share on facebook
Share on twitter
Share on linkedin

Billions of users were unable to access Facebook, Instagram and WhatsApp for hours while the social media giant scrambled to restore services

Facebook and its other platforms, including Instagram, WhatsApp and Messenger, went down globally for close to six hours on Monday and Tuesday, depending on your time zone.

In what is estimated to be one of the largest ever such outages in human history, some 7 billion people were affected around the globe (no, you didn’t read that wrong!)

As services are being restored, questions are being asked about what caused the outage, and why it took so long to fix.

Why did Facebook go down?

At some point in the middle of the (Sydney) night and during “prime” hours in many parts of the rest of the world, people began noticing they could not access Facebook, Instagram, WhatsApp or Messenger.

It took more than five hours before services were restored.

Facebook issued a statement confirming that the cause of the outage was a configuration change to the backbone routers that coordinate network traffic between the company’s data centres, which had a cascading effect, bringing all Facebook services to a halt.

This hadn’t stopped people speculating that it was the “largest hack in human history”, that it was something to do with AWS (the could service that props up some of Facebook’s data storage), or other colourful theories.

The outage meant not only was Facebook gone, but everything Facebook runs disappeared too. Others have provided a bit more detail on why Facebook vanished from the internet.

Cloudflare – which had its own recent internet outage issues – provided a detailed explanation about what happened.

It involves two things that sort out how the internet is the internet – that is Domain Name System (DNS) and Border Gateway Protocol (BGP).

DNS is essentially the address system for the location of each website – its IP address – while BGP is the roadmap that finds the most efficient way to get to that IP address.

Cloudflare said Facebook on Monday essentially told BGP through a series of updates that those paths to Facebook no longer existed. But not just for Facebook, everything Facebook runs. That meant people trying to reach Facebook couldn’t find the path to access it.

What about Facebook’s other platforms?

All of Facebook’s services were affected, not just Facebook. It included Facebook’s own internal systems, with reports staff were locked out of offices, and could not access their own internal communications platform.

Why did it take so long to fix?

Facebook staff were reportedly unable to access their own communications platform, Workplace, and were unable to access their office due to the security pass system being caught up in the outage.

Facebook indicated the duration and severity of the outage meant the systems were being brought back to full capacity slowly.

Facebook so far has not gone into much detail about what went wrong and how it was fixed, but there were multiple reports the social media giant sent a technical team out to its servers in California to manually reset the servers where the problem originated.

What does the outage tell us?

Probably not too much. It is unlikely that the outage was due to reasons completely within Facebook’s control.

While the issue that took place is not something that can be completely avoided, what we CAN surmise from the incident is the social risk of having a single point of failure for a vast number of online services people rely on.

People rely on Facebook not only to connect with friends and family, but businesses use it to log into other services including online sales websites.

In some countries, it is the dominant means of communication through services like WhatsApp.

That an outage can have such a profound impact on billions of people for several hours will give some pause for thought.

How much did it cost?

The outage, along with a whistleblower story regarding Facebook that went to air on Sunday, prompted Facebook’s share price to drop 4.9% on Monday, causing founder and CEO Mark Zuckerberg’s personal wealth to drop $6bn.

Ahmed Khanji

Ahmed Khanji

Ahmed Khanji is the CEO of Gridware, a leading cybersecurity consultancy based in Sydney, Australia. An emerging thought leader in cybersecurity, Ahmed is an Adjunct Professor at Western Sydney University and regularly contributes to cybersecurity conversations in Australia. As well as his extensive background as a security advisor to large Australian enterprises, he is a regular keynote speaker and guest lecturer on offensive cybersecurity topics and blockchain.

Emergency Assistance

Under Attack?

Please fill out the form and we will respond ASAP. Alternatively, click the button to call us now.