Why didn't the CrowdStrike bug get caught during testing?
30
6.5kṀ7483
resolved Jul 24
Resolved
NO
They Shipped the wrong bits.
Resolved
NO
Race condition in threads that test environment didn't hit.
Resolved
NO
Deterministic (not ASLR related) random number generation.
Resolved
NO
The SW got picked up before the trap was full.
Resolved
NO
A failure in the update mechanism itself
Resolved
NO
We won't know in more than a year (question expire)
Resolved
NO
The test environment has ASLR disabled (root cause)
Resolved
NO
They didn't do any testing of the patch.
Resolved
NO
Flaky tests got retried until they passed
Resolved
NO
An intern did it

Get
Ṁ1,000
to start trading!

🏅 Top traders

#NameTotal profit
1Ṁ1,614
2Ṁ461
3Ṁ238
4Ṁ178
5Ṁ94
Sort by:

Well

https://www.crowdstrike.com/falcon-content-update-remediation-and-guidance-hub/

> On July 19, 2024, two additional IPC Template Instances were deployed. Due to a bug in the Content Validator, one of the two Template Instances passed validation despite containing problematic content data.

So it sounds like the testing came down to a buggy static validator and the channel files were not directly tested on a windows machine.

So I don't think any of the proposed theories are correct.

bought Ṁ500 NO

can't be bothered to add an answer but I'm placing an informal bet on "parser differential"

Update:
https://www.crowdstrike.com/blog/technical-details-on-todays-outage/

Does not answer the question "Why did it not get caught during testing"

Yeah, my hunch is that this is going to be a combination of multiple bugs (bad failures often are). They've called out a logic bug that caused the crash itself, but it does not at all explain how a malformed channel file managed to get deployed.

bought Ṁ50 NO

Not a root cause. If interns can evade testing and push to world, that's a testing & deployment failure, not an intern failure.

Oh most definitely. I would also hope that Crowdstrike wouldn't throw an intern under the bus (especially since they should understand blaming an intern is counterproductive to convincing people you know what you are doing). However, this is a multiple correct answer market, so if they do throw someone under the bus, I guess this would need to be selected as a partially correct answer. Though if their process is "The intern decides to ship the product" it will be surprising. However, if it is simply "an intern wrote the code" that isnt sufficient to get this selected, Crowd strike would have to imply that the only reason this happened was because of the intern. This would, however, almost certainly also imply that they didn't test the patch or that the intern had the ability to ignore test failures.

@retr0id To distinguish this from "they shipped the wrong bits", I mean that the failure is on the client-side somewhere (i.e. the right bits were shipped over the network, but they didn't make it onto disk in-tact)

Im beginning to suspect that resolving this question is going to involve a lot of headaches.

@ChrisGreene This means that the files that were shipped to customers were not the ones that went through the testing process, i.e. the deployment process either copied the wrong files, or they became corrupted as part of the deployment process.

It means, fundamentally, that the issue is not a code bug, but rather an issue with the deployment process itself.

bought Ṁ50 YES

I tried to create this with "Anyone can add answer questions later, but I'm not seeing the 'Other Category'" Gah Manifold

Okay! This is a multi-resolve possible thing (multiple root causes), so we're good!

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules