this post was submitted on 19 Nov 2025
276 points (99.3% liked)
Technology
76918 readers
3338 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
If you want a technical breakdown that isn't "lol AI bad":
https://blog.cloudflare.com/18-november-2025-outage/
Basically, a permission change cause an automated query to return more data than was planned for. The query resulted in a configuration file with a large amount of duplicate entries which was pushed to production. The size of the file went over the prealloctaed memory limit for a downstream system which died due to an unhandled error state resulting from the large configuration file. This caused a thread panic leading to the 5xx errors.
It seems that Crowdstrike isn't alone this year in the 'A bad config file nearly kills the Internet' club.
So the actual outage comes down to pre-allocating memory, but not actually having error handling to gracefully fail if that limit is or will be exceeded... Bad day for whoever shows up on the git blame for that function
This is the wrong take. Git blame only show who wrote the line. What about the people who reviewed the code?
If you have reasonable practices, git blame will show you the original ticket, a link to the code review, and relevant information about the change.
Plus the guys who are hired to ensure that systems don't fail even under inexperienced or malicious employees, management who designs and enforces the whole system, etc... "one guy fucked up and needs to be fired" is just a toxic mentality that doesn't actually address the chain of conditions that led to the situation
That should also come up in a reviews also. Not trying to imply one guy should get fired as a scapegoat, just talking from experience how much it sucks to know your code caused major issues.