CyberThreat Dialogues
Ep. 27 July 20, 2025 51:15

Ep. 27: CrowdStrike One Year After the Outage — Lessons Learned

Anonymous Former CrowdStrike Engineer

One year after the July 2024 outage that affected 8.5 million Windows devices, I sat down with a former CrowdStrike engineer who was there when it happened. Speaking anonymously, they walk us through what went wrong, how the company responded, and what the industry should learn.

Now Playing: Ep. 27: CrowdStrike One Year After the Outage

0:00
51:15

Full Transcript

Alex MercerHost, CyberThreat Dialogues

Today's guest is speaking anonymously. They were a senior engineer at CrowdStrike during the July 2024 outage and have since left the company. We've verified their identity and employment history. For obvious reasons, we'll call them 'the engineer.' Thanks for being willing to share your perspective.

The EngineerFormer CrowdStrike Engineer

Thanks Alex. I want to be clear — I have enormous respect for CrowdStrike and my former colleagues. But I think the industry needs to hear an honest account of what happened and why, especially now that a year has passed.

Alex MercerHost

Walk us through what happened on July 19th, 2024.

The EngineerFormer CrowdStrike Engineer

The short version is that a rapid response content update — what we called a channel file update — contained a logic error that caused the Falcon sensor to crash at the kernel level on Windows systems. The update was pushed globally without the staged rollout that regular feature updates go through. Within hours, an estimated 8.5 million devices blue-screened.

Alex MercerHost

Why wasn't it caught in testing?

The EngineerFormer CrowdStrike Engineer

Rapid response content updates had a different validation pipeline than regular updates. The thinking was that these needed to be fast — we're talking about responding to active threats. So the validation was lighter. The tradeoff between speed and safety was a known tension internally. Some of us had raised concerns about the deployment velocity, but the argument was always that speed is essential for threat response.

Alex MercerHost

Were there internal warnings before the outage?

The EngineerFormer CrowdStrike Engineer

Yes. Multiple engineers, including myself, had flagged the risk of pushing kernel-level updates without staged rollouts. There were near-misses in the months before. Small issues that were caught before they went global. But the near-misses weren't treated as the warning signs they should have been. There was a culture of moving fast, and it worked — until it didn't.

Alex MercerHost

How did the company respond once they realized the scope?

The EngineerFormer CrowdStrike Engineer

I'll give them credit — the response was fast and comprehensive. George Kurtz was on the phone within the hour. The engineering team worked around the clock for days. The root cause was identified quickly. And the subsequent changes — phased deployments, enhanced validation, customer-controlled update policies, the independent review board — these were all the right moves.

Alex MercerHost

What's the biggest lesson for the industry?

The EngineerFormer CrowdStrike Engineer

Two things. First, kernel-level access is a double-edged sword. It gives you deep visibility, but it also means a single bug can take down an entire system. The industry needs to think hard about whether kernel-level deployment is worth the systemic risk. Second, security monocultures are dangerous. When one vendor protects millions of endpoints and a single update can affect all of them simultaneously, you've created a systemic risk that rivals the threats you're trying to defend against.

Alex MercerHost

Has CrowdStrike regained trust in your view?

The EngineerFormer CrowdStrike Engineer

Largely, yes. The changes they implemented are real and meaningful. Customer retention has been strong. But the incident permanently changed how CISOs think about vendor concentration risk. And that's probably a healthy outcome for the industry, even if it was painful for CrowdStrike.

Alex MercerHost

Any advice for CISOs evaluating endpoint security today?

The EngineerFormer CrowdStrike Engineer

Demand transparency about update deployment processes. Ask vendors about their staged rollout mechanisms. Understand what level of system access their agent requires and why. And consider diversification — don't put all your detection eggs in one basket.

Alex MercerHost

Thank you for sharing your perspective. This is exactly the kind of honest, firsthand account that the industry needs more of.

The EngineerFormer CrowdStrike Engineer

Thanks Alex. I hope it helps people understand that even the best companies can make mistakes, and the important thing is how you learn from them.

Frequently Asked Questions