The
official explanation:
... they go on to explain that automated BGP updates spread the non-presence of facebook's DNS info throughout the global DNS, bringing down DNS resolution of facebook (which technically doesn't take Facebook offline, but also practically blocks most non-technical users). I would like to emphasize that this is an
awfully convenient cover story, like, if you were to explain a DNS cyber-attack without
calling it a cyber-attack, this is exactly what the explanation would go like. So, who knows what really happened, but nearly 6 hours of global DNS blackout over a "whoopsie-daisy" just... isn't believable to me. Tech companies at Facebook's scale have on-site UPS + diesel backup generators ready to spool up on a moment's notice (I know this firsthand), 24x7 support staff who deal with local outages all over the world on a daily basis, and regularly perform fire-drills for emergencies just like this.
The least believable explanation to me is a software update or update of any kind. Again, at tech companies of this scale (and even much smaller, local companies) before IT performs an update/rollout of any kind, they make backup images of
everything, so they can do an instant revert in case the update causes an unforeseen crash/incident. The last thing you want to be doing is troubleshooting the cause of an outage
while everything is down. In addition, big rollout/updates are always piloted, even at smaller companies. They pick a test group of machines/networks, and roll out the update on those. After the bugs/kinks are worked out, only then do they do the full rollout. All of this is so mechanical and routine in modern IT that it's no more eventful than a corporate fleet manager sending their vehicles to the mechanic for an odometer checkup. Too many "just so" explanations in the news headlines these days, it's getting downright annoying...
Connect With Us