Have you ever seen an error message that made you laugh? The "Blue Screen of Death" is a famous error screen from Windows computers. When it appears, it means the computer has crashed. It has become so well-known that people have created thousands of memes about it.
 
The "Blue Screen of Death"
This error screen has become a famous part of computer culture.
But what happens when a simple software update causes a similar crash on millions of computers at once? How could one update cause global chaos? In this lesson, we will explore a real-world event that showed how connected—and vulnerable—our world is to technology.
CrowdStrike IT outage: How and why it happened | BBC News
Video Transcript
Well, I think it depends on the organization. We're already seeing that fix being implemented in large, well-resourced, quite wealthy companies like American Airlines. They say they've managed to put that fix on their machines and they are back to basics where they were. But of course, with an issue like this, especially with an airline, there will be a backlog and the delays will cause further chaos for days to come. [ 00:21 ]
But I think the issue we've got is when there are large organizations that are perhaps under-resourced in their IT departments. So for example, if you've got tens of thousands of computers or endpoints, as they're called in the cybersecurity world, you need to get all of those back up and running. And I'm hearing that this is not a case of doing an automatic update in the same way that it happened overnight. The automatic update went out by CrowdStrike onto computers around the world and no one even had to do anything. This is a case of fingers on keyboards. [ 00:50 ]
This requires IT technicians going to many of the computers, in fact, most of the computers, and getting their fingers on the keyboards, doing a reboot, putting it into safe mode, and downloading the new correct CrowdStrike update to try and fix the problems. There are ways around it. If it's, for example, a server, if there's a cluster of servers inside an IT company, I've been told that it is possible to do this with some sort of over-the-air, over-the-internet type fix, but that's rare and it still has to be done on every single computer. So I think although we now have a fix, the real problem is how do you get that fix onto computers? [ 01:28 ]
Huge questions to be asked of this company when the dust settles. And already we're seeing the company being hit where it hurts on their share price, lost a fifth of their value already this morning, and that's before some markets have opened. So this is a company now scrambling not only to fix the current situation but also their reputation and their brand. It was and is one of the biggest cybersecurity companies in the world, famous for its very good and well-trusted endpoint protection, so the protection on computers around the world. [ 01:57 ]
But it's all about trust, and that trust has been affected by today, no doubt. Because the irony here is that in cybersecurity, we're always being told to install the updates, whether that's personally on our phones or laptops when an update comes in from Apple for example, or Google. That's because the updates are there to protect us. They're there to fix any potential bugs, fill up any security gaps, and to make things run smoother. Here, of course, doing the right thing got you in lots of trouble if you had automatic updates on with CrowdStrike overnight, you would wake up to an issue here that's bringing down entire companies. [ 02:32 ]
So yes, huge questions to ask of CrowdStrike. The way these things work, of course, is that there will be a team of engineers that wrote this update, wrote the software, the code, over the last few weeks or months, and it would be sent out last night after lots and lots of checking. So CrowdStrike would have put it onto lots of computers and seen whether or not it affects anything. Somehow, something's gone wrong in that safeguarding process and it's gone from the testbed, the sandbox of the internal computers at CrowdStrike, into the wide world and caused these massive problems. [ 03:03 ]
Windows is being really careful to distance itself from this problem. They're saying, "Look, this is not our problem. This is not our fault." Well, it is their problem, of course. They're helping to try and fix this, but they're saying this is nothing to do with our system. This is a CrowdStrike issue, and I think that is fair because, of course, CrowdStrike, when they make a new piece of software, they have to make it for all the different operating systems. So that's Windows, Linux, Mac. And I think the reason why Windows has perhaps proven the most difficult is because perhaps that's the largest group of customers that they have, they're running Windows, and for some reason, there is something to do with the bespoke code that's done for Windows that has not affected Linux or Mac. I would argue though that probably there are fewer customers using those particular operating systems. [ 03:49 ]
And people are asking today, you know, how does this affect me? Do I need to watch out for when I turn on my computers? The answer to that is probably not, because CrowdStrike has built its giant company through enterprise. So that's going after these big organizations like American Airlines and like others that are responsible for thousands or tens of thousands of, usually, Windows computers. [ 04:11 ]
Interestingly, I've just spoken to an IT manager who is scrambling around and struggling and very, very stressed out for a medium-sized organization in this country, in the UK. And he's responsible for 4,000 computers, but it's giving me an idea of just how stressful it is for him because he says that once you are fingers on keyboards on a computer, it is a quick fix. You have to press a few buttons, put it into safe mode, and then you can download the correct CrowdStrike software and everything's fine. But the problem is, he says, "We've got computers spread across five different sites, so that means you have to physically drive from one to the next to the next to fix this problem." [ 04:46 ]
People are saying that the closest we'll get to this, I think, was 2017 with the WannaCry cyberattack. That was a deliberate and malicious cyberattack that affected about 300,000 computers in 150 countries. But that was stopped, and that meant that the virus stopped spreading and that people could rebuild from there. What we're seeing here, of course, is that if an IT manager hasn't seen the news today and they're waking up, for example, in the US, they turn on the computer, they're going to get the blue screen of death. [ 05:13 ]
The Event
On Friday, July 19, 2024, a faulty software update from a cybersecurity company called CrowdStrike was sent to millions of computers running the Windows operating system. This update was supposed to improve security, but instead, it caused a critical error. The result was the largest IT outage in history, affecting an estimated 8.5 million devices worldwide.
This single update caused massive disruptions. Airlines had to ground flights, leaving thousands of passengers stranded. Banks couldn't process transactions, and hospitals had to cancel appointments and surgeries because they couldn't access patient records. The economic impact was estimated to be in the billions of dollars. In response, government lawmakers questioned the company's CEO about how this happened. The company apologized and promised to change its processes to prevent a similar incident in the future.
Vocabulary
Unlock full access by logging in. Registered users can explore the entire lesson and more.