Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
CrowdStrike to Delta: Stop pointing at us (wsj.com)
41 points by neofrommatrix on Aug 5, 2024 | hide | past | favorite | 73 comments


One issue that hasn't received enough attention comes from a comment on Dave Plummer's video on the CrowdStrike outage. Dave Plummer is a former Windows engineer and runs a YouTube channel call Dave's Garage.

@zug-zug wrote:

> While this is technically what crashed machines it isn't the worst part.

> CS Falcon has a way to control the staging of updates across your environment. businesses who don't want to go out of business have a N-1 or greater staging policy and only test systems get the latest updates immediately. My work for example has a test group at N staging, a small group of noncritical systems at N-1, and the rest of our computers at N-2.

> This broken update IGNORED our staging policies and went to ALL machine at the same time. CS informed us after our business was brought down that this is by design and some updates bypass policies.

> So in the end, CS caused untold millions of dollars in damages not just because they pushed a bad update, but because they pushed an update that ignored their customers' staging policies which would have prevented this type of widespread damage. Unbelievable.

Link to video:

https://www.youtube.com/watch?v=wAzEJxOo1ts


I'm pretty sure this is why everything we got in the first 48 hours from CS was stressing that the issue was with a "channel file" (threat definitions, content updates, etc).

Their staged update process is for the falcon driver itself. It is not for the "channel files".

As I understand it, the driver itself is understood to be a risk, and they provide facility for an N, N-1, N-2 staged deployment to mitigate this risk.

As I understand it, channel files were not identified as a risk, and were never subject to this staged deployment.

The "sell" was that you could be running a trusted driver at N-2, but still have 0day protection from up-to-date channel files. And CS's initial feedback that the issue was not with the driver itself was CYA that they hadn't been misleading customers using such staged deployments.


That's an important distinction. CrowdStrike probably did, in fact, CYA in the licensing terms.


If this is true, this is the smoking gun that screams "negligence" from a legal standpoint and CrowdStrike's insurers will be making a lot of payouts.


Relevant to dave plummer: https://news.ycombinator.com/item?id=39813625

> Now, as to the tidbit. Dave Plummer ran a scam company that was sued by Washington State in 2006, "SoftwareOnline.com, Inc. ". He actually left Microsoft specifically to run this company.

> Court documents can be seen here: https://www.atg.wa.gov/news/news-releases/attorney-general-s... You can find David W. Plummer listed in the court complaint.

> The short of it is that it was an online software scam company that tricked people into downloading fake Anti-virus and security software using online ads, and then the software delivered additional adware and nagware onto users machines.


The term “ad hominem” gets casually thrown around quite a bit in these parts, but boy howdy this is the literal textbook case of it. Plummer’s not one of the good guys, noted. Is he factually wrong?


The stuff he claims on his videos are at best misleading at worst outright lies. Like his recent claim that he made the vertical text in Windows start menu render in real time instead of a bitmap. Except every version of Windows (released/beta/unreleased or even the various source code leaks) that had that type of start menu used a bitmap.


That was like 18 years ago and not relevant to the topic or thread. People make mistakes in life and deserve to be able to move past them.


Almost 20 years ago. Not sure if it's relevant. certainly not to crowdstrike.


wow I didn't expected that

That doesn't invalidate the parent comment tough


Wow, this is quite damning. I'm not sure if I was Dave I would have posted that so publicly, as there are billions at stake here.


yeah this is bullshit, and when we spoke to our cyber dept about why we chose a product that allows this they said "all the top tier products do this".

I did suggest we turn off the proxy for the "air gapped" parts of the nextwork, and only turn it on when we're sure we're ready for it so the airgapped parts can get the updates they need. but seriously... since when is it acceptable to give a vendor control that YOU DONT HAVE over parts of your network.. crazy days.


I'm wondering if Delta outage is related recovery time of recovering some critical workload processing that was happening when servers went poof.

I've been at companies where we would do a bunch of processing overnight. One time, CPU died on a server in the middle of processing. Apparently we didn't have procedure for recovering in the middle of a process. Took us two days to write code to recover and we had to contact a third party vendor to get "What data did we send you?"


> CrowdStrike said Sunday that its liability is contractually capped at an amount in the “single-digit millions.”

Companies handling critical infrastructure should face more scrutiny imo.


Crowdstrike is not handling critical infrastructure. Delta is.

The reality is the industry wants its cake and eat it too. No one forced Delta to buy a software which could force upgrades in their production fleet. They're a billion dollars company, and should put their big boys pants on.


Am I right in thinking Delta could have chosen when the update was distributed to its infrastructure?

In my mind, a quick test run of the update on a VM before letting it roll out globally would have revealed the BSOD boot loop.


AFAIK crowdstrike can push updates at any time at any host. There are staging areas they may use, but don't have to (particularly for definitions updates).

Crowdstrike should have done a better job, but Delta chose them (to offload the responsibility and work) and now they're claiming foul. They knew the risk. This is a classic executive play of claiming the fault lies in the consultants/vendor and taking no responsibility.


Just shows how many planes would be falling out of the sky if there weren't federally mandated safety systems, secondary hydraulic circuits, and failover hot spares at nearly every layer of the stack. Delta should've had backup systems, just like their planes do.


I'm not sure how "you should never use CrowdStrike" is an argument in CrowdStrike's favor.

I guess you're saying they shouldn't have outsourced in the first place? Which does sound like the correct conclusion in this case...


I'm not trying to defend CrowdStrike, but pointing to the fact Delta is the one maintaining and owning critical infrastructure and the executives trying to shift this responsibility onto someone else is the reason this happened in the first place. :)


Okay, good to know. I always thought those embedded systems would be a real pain to maintain.


No. The update was forced


> No one forced Delta to buy a software which could force upgrades in their production fleet.

Except this update was one from CrowdStrike that would ignore Delta's stated update policy.

And they literally said "Oh, yeah, we can configure some updates to bypass your policy". I wonder how well this was communicated to those customers.


Did crowdstrike force delta to accept running what essentially is a permanent RCE in their production fleet? You do not buy a software that is capable to do that and you put the fact it's not capable of doing that in the contract.

The update policy may work for the client version updates, but not for the "policy definition", otherwise delta won't get the sweet "all vulns mitigated with a 4h SLA" they crave.


I mean. Delta is also an airline, and if airline's love to do one thing it's to point fingers and shift blame. Mostly such that they don't reimburse you what you are legally owed if they jam you up, but also it seems throughout.


> Crowdstrike is not handling critical infrastructure.

lol.


> CrowdStrike said Sunday that its liability is contractually capped at an amount in the “single-digit millions.”

Well, that's nice. If I understand correctly, though, you can't contractually limit liability for gross negligence. I mean, you can say it in the contract, but it isn't legally enforceable.

It does raise the bar, though - gross negligence is harder to prove than ordinary negligence.

Note well: IANAL. I could be wrong.


By more „scrutiny“ do you mean increase the liability cap?


Yes, because that incentivizes such companies to be more diligent.


Compare that to the Colonial Pipeline ransomware hack, where a Russian group disabled critical infrastructure by accident and the Russian government intervened. Crowdstrike on the other hand - national infrastructure is taken out and no word from the agencies on how to prevent a re-run.


It's absolutely nuts to me that any entity has the Full Software Authority (FSA), which is the ability to push out (closed-source) kernel-mode software to a mind-boggling large fraction of the world's computing base.

Talk about a strategic vulnerability. "It's for security" they said.

Now imagine that a nefarious actor managed to compromise the FSA and push a stealth root kit. I shudder to think about how a full-out cyberwar would go.


The Microsoft part of this I find interesting, I mean I guess it would be standard legal practice to also go after them even if it is thrown out. But that one is weird.

Maybe Microsoft encouraged Delta to use Crowdstrike, then ok I could see the case.

But if it is anything related to how Windows works, that seems like a stretch. Yes Windows could be better, but I don't like the idea that the OS provider could be sued for a piece of software causing problems. That seems... dangerous.

Regarding Delta. This seems like an interesting situation. Apparently Crowdstrike offered people to help, but how many people and where. Realistically how much help would they have been able to help given that it isn't like this was isolated to Delta and I imagine their people were stretched pretty thin. And importantly what exactly was the help provided. I don't see any information about this.

While I am sure that Delta's IT department was understaffed, this was also a unique situation. If you spent the time to make a well optimized machine for rolling out updates, things were automated, and you expected things to go wrong but I would have never anticipated every Windows machine being unable to boot. That is an extra-ordinary situation. I doubt any IT department is really staffed to be able to handle that situation happening. You don't expect to need to deal with every machine you have, its honestly kinda unrealistic.

It will be interesting to watch this one play out in the courts. Because at the end of the day the vast majority of the blame for this entire situation is on Crowdstrike. The companies were handed a situation that we were not prepared for.


delta is responsible for delta, there are no absolutes in software and systems, at the end of the day delta has to be ready for anything, if they choose to under staff and rely on black box software..that is a choice they make. Microsoft was forced by EU, and crowdstrike had a (rather large) mistake, companies have them you know. It's the same reason why i still backup my google photos and drive.


> While I am sure that Delta's IT department was understaffed, this was also a unique situation. If you spent the time to make a well optimized machine for rolling out updates, things were automated, and you expected things to go wrong but I would have never anticipated every Windows machine being unable to boot. That is an extra-ordinary situation. I doubt any IT department is really staffed to be able to handle that situation happening. You don't expect to need to deal with every machine you have, its honestly kinda unrealistic.

I do not believe this will be the last of its nature in our generation. It should very much be already have been accounted for. The basics and fundamentals of information security apply despite what interpretations of insurance policies or certifications say. Let this be a lesson to not introduce single points of failure into critical systems without having prepared for their unavailaility or misbehaviour (yes, that includes your ISPs, cloud, and SaaS providers


> Let this be a lesson to not introduce single points of failure into critical systems without having prepared for their unavailaility or misbehaviour.

I think that is where this is going to get more complicated, I think that this has broadened what we traditionally think about when it comes to what we consider a single point of failure.

I would wager that most people would not have considered something like Crowdstrike to be a single point of failure. It is not what you would generally think of.

From what sounds of the bits and pieces of Delta that came out, it sounds like the biggest issue they had was that so many of these systems were not together. It required physical access to systems that were in many different locations (I would love to be corrected on this). Under normal circumstances that would cover you in the case of an outage in a particular location. If you built your system to handle entire areas going down, adding in redundancy.

But this was very much a unique situation. We really shouldn't pretend otherwise. It wasn't an external service going down, or AWS going down, loss of internet connection, or other things going down that we would normally account for in disaster recovery. Hell I would argue that in this particular situation, you could have the best disaster recovery but it wouldn't have done anything since Crowdstrike was probably baked into your images. So have fun bringing up an instance that was going to instantly brick. Yeah eventually that would have been a non issue, but I can not imagine a scenario that I would have ever thought that I had a baked image and assuming I still had access to it and I could assure that it was the image that I made, that it somehow would itself be a problem.

Before a couple weeks ago if we had built a system that was redundant enough to handle parts going down, different geographical locations, etc etc with, maybe things slow down but still works. All of the standard things that we talk about.

It isn't absurd to think that your entire system going down in a situation where you don't think you have a single point of failure, is nearly impossible or if it happens there is something outside of your company seriously bad.


> I would wager that most people would not have considered something like Crowdstrike to be a single point of failure. It is not what you would generally think of.

Sure. Most people are not decision-makers for critical IT infrastructure. Should we similarly throw our hands in the air if a bridge or building collapses because the people responsible didn't pay attention to structural safety just because it wasn't obvious to a layperson?

> Hell I would argue that in this particular situation, you could have the best disaster recovery but it wouldn't have done anything since Crowdstrike was probably baked into your images.

It's not really "the best disaster recovery" then, is it?

> Yeah eventually that would have been a non issue, but I can not imagine a scenario that I would have ever thought that I had a baked image and assuming I still had access to it and I could assure that it was the image that I made, that it somehow would itself be a problem.

There are so many other ways this can back-fire I'm not going to try to enumerate them. I really hope no one put you in charge of operating anything critical...


It sounds a lot like CrowdStrike is saying "Delta should have known better than to rely on our software for critical functions."

Which may be a fair statement, but I think it's also fair to litigate whether or not this post-incident statement is consistent with how CrowdStrike sold their software to Delta.


Did CS claim that their software is infallible? Sales isn't shy to take liberties but this would surprise me.


There's quite some distance between "infallible" and the kind of failure mode CS's error induced. Be that as it may, I don't think the question is about infallibility, though. The question is, was the software fit for the purpose Delta used it for? And the follow-on question is whether CrowdStrike sold it for the purpose Delta used it for.


Fair, but certainly that's the responsibility of the purchaser, no? CrowdStrike couldn't feasibly know how all their customers utilize their software. That's the end user's responsibility.

Software fails, machines fail, we all know this. The technical leadership at Delta should know this. Do we think software vendors should be responsible for ensuring their customers safely deploy their software? I can't imagine that playing out well.


If this were something that were being purchased off-the-shelf, it'd clearly all be the responsibility of the purchaser.

When it's priced differently depending on how the customer plans to use it, and it's sold only through a consultative process, I'm not sure it's so clear. In order to determine their pricing, the sales team demands to know how the customer is using it... so they absolutely arguably do know.



I struggle to understand MSFTs liability here. Can anyone explain to me how Microsoft would be liable for Delta’s outage?


People are blaming Microsoft for signing a driver with bugs. Also something about not having an eBPF API.

This just sounds like shifting the blame to me. Nothing Microsoft did changed anything for the CS situation. Apparently it's Microsoft's fault that they don't have a great, secure, bugless API that everyone seems to want all of the sudden.

On the day the crashes started, Microsoft had a cloud services outage, and people assumed the two outages were related (they sort of were, in that some of the consultancies Microsoft hired also ran CrowdStrike so they couldn't get support for the cloud outages as easily). That seemed to have stuck around.


Not providing a way to inspect those data except from within kernel drivers (or whatever Windows calls them.) The huge HN thread about what happened weeks ago had some comments about Linux using eBPF to get the same kind of information Crowdstrike needs and Macs having another technology to do the same thing. In both cases the kernel won't crash and take down the machine. Of course it's possible to hog a machine from user space and make it unusable.


And in turn, Microsoft blames the EU for forcing them to allow an external vendor having kernel level access https://www.euronews.com/next/2024/07/22/microsoft-says-eu-t.... Lot of finger-pointing going around here.


People also pointed out that bugs in Red Hat's and Debian's kernels caused CrowdStrike's non-buggy eBPF drivers to hang the system a few months prior (as they would for any code calling similar eBPF methods). Having the API is not enough, though in this case CrowdStrike couldn't really help it as none of their supported platforms were affected.


Alot of people are blaming MSFT for approving this kernel driver and allow kernel drivers like this to crash the system.


Maybe as an operating system that Delta purchased, its kernel should not crash when a 3rd party software receives a faulty update?


I really dislike this line of thinking because it assumes that Microsoft is responsible for anything you run on your Windows machine.

I should be able to do whatever I want with a computer I buy, but that doesn't mean Microsoft should hold any liability for it. What am I missing here?


The difference is you are an individual and Delta is a public company. For enterprise purchases, you would have SLAs and contract terms to protect your company when something bad happens.


Fair, if this violates an SLA then Microsoft should pay up.

If Microsoft's SLAs are worded so that you're allowed to shove bits of third party code into the kernel without violating the SLA, though, they should fix that.


They can’t do anything about a 3rd party software that runs in kernel space


I think CRWD is trying hard to close the litigation door here. If the litigation goes to court and even if they managed to reach agreement for a fairly small amount, say 50M, it's going to open doors for much more later.

I think their lawyers are trying hard to convince Delta to not sue them and they will pay in back channels. As long as it's obstructes from public news, it's a win for them.


> a “misleading narrative” that the cybersecurity company was responsible for the airline’s tech decisions and response to the outage

Am I reading this right that this amounts to "you decided yourself to install our software on your devices, you should have known better?"


They seem to be teasing a narrative that has yet to be released:

> “Should Delta pursue this path, Delta will have to explain to the public, its shareholders, and ultimately a jury why CrowdStrike took responsibility for its actions—swiftly, transparently, and constructively—while Delta did not,” wrote Michael Carlinsky, an attorney at law firm Quinn Emanuel Urquhart & Sullivan.

or maybe it's just how Delta didn't take their offer for help

> The cybersecurity company reiterated its apology to Delta for the initial disruption and said it had offered on-site assistance to Delta but was told it wasn’t needed. It said Bastian didn’t respond to outreach from CrowdStrike’s CEO.


...because all CrowdStrike had to do was say "oops, we f*%&/$d up the last update, here's a fixed one", while Delta's IT had to locate all BSODing devices and somehow recover them?


This is just their response to a legal challenge. If we're asking rhetorical questions, then "why wouldn't CrowdStrike's lawyers try to defer responsibility and not pay millions of dollars?". Otherwise it's reading a bit too much into it and the courts will sort it out.

> CrowdStrike said Sunday that its liability is contractually capped at an amount in the “single-digit millions.”

That's probably a key question too^


Isn't there a legal/lawyer equivalent to saying the wrong thing to cops. E.g. If you as a person lie or mislead during questioning and that adds weight to your being guilty during trial.

So lawyers trying to pawn this off and use stupid legalese to shift blame, make or ask time-wasting accusations etc. I'm not a lawyer or versed in all the legal stuff, but something smells like a double standard here.


Delta, and others, clearly failed to account for the full risk of installing CrowdStrike across their fleet. If they did account for it, they gave it little enough weight to not have a path to recovery ready to go in a disaster recovery scenario.

This is certainly a learning experience for MSFT and CS, but I think it's foolish to say they are at fault. Delta has a choice in what software they employ.


I can see CS's argument here. Most other airlines were back up within the day of the update going out but Delta slogged on and it's outage continued for several days. If Delta and all other airlines experienced a multi-day outage then yeah I see what Delta is saying, but the fact that all other carriers were up and running within the day while Delta took upwards of a week has me thinking this is more on Delta's internal IT policies/controls/etc rather than CS.


Coming from a fault tolerance background, this seems to be a prime example where OS diversity would have helped. But clearly staging the roll out of updates (even definition files) should be standard practice when you have more than 10k customers.


CrowdStrike was responsible for the first day of the outage and should deal with any damage that it caused.

Delta was responsible for the rest of its outage and should deal with any damage that it caused.



Its not your fault it happened, but it is your responibility to deal with it.

edit: delta should have had better system outage processes


Looks like the common race to the bottom is happening: which is the lawyers have taken control.


The lawyers have always been in control since the beginning. And more so today. It's part of the idea of the United States of America.


More specifically, a huge amount of legal negotiation goes into a B2B SaaS deal between parties of this size up front, incl liability caps, SLAs, and MSA, etc. One of the bigger obstacles in the sales cycle even (getting legal alignment).


I'm not familiar with how CrowdStrike updates typically roll out, are they not phased?


They are for software updates, but this was just an update of the definitions file, which couldn't be parsed properly by the kernel module. These are rolled out immediately to provide best protection against the latest threats. As I understood, they tested the parser and the definitions file separately, but not this particular combination, which IMHO was the core of the failure here.


I sense legal escalation coming.


I feel the opposite. I think Delta's being so loud because they think they can win the court of public opinion, but can't win in a legal court.


Tbf, this is like a goldmine for the lawyers right now, on both sides! It's not clear, it's ambiguous, needs to be litigate and decided and contracts are challenged, and damages occurred all round, etc. Hundreds of millions in fees are gonna get floated, and years of litigation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: