True. They aren't building city sized data centers and offering people 9 figure salaries for no reason. They are trying to front load the cost of paying for labour for the rest of time.
BigMuffN69
Remember last week when that study on AI's impact on development speed dropped?
A lot of peeps take away on this little graphic was "see, impacts of AI on sw development are a net negative!" I think the real take away is that METR, the AI safety group running the study, is a motley collection of deeply unserious clowns pretending to do science and their experimental set up is garbage.
https://substack.com/home/post/p-168077291
"First, I don’t like calling this study an “RCT.” There is no control group! There are 16 people and they receive both treatments. We’re supposed to believe that the “treated units” here are the coding assignments. We’ll see in a second that this characterization isn’t so simple."
(I am once again shilling Ben Recht's substack. )
Wake up babe, new alignment technique just dropped: Reinforcement Learning Elon Feedback
Yeah, METR was the group that made the infamous AI IS DOUBLING EVERY 4-7 MONTHS GRAPH where the measurement was 50% success at SWE tasks based on the time it took a human to complete it. Extremely arbitrary success rate, very suspicious imo. They are fanatics trying to pinpoint when the robo god recursive self improvement loop starts.
One more comment, idk if ya'll remember that forecast that came out in April(? iirc ?) where the thesis was the "time an AI can operate autonomously is doubling every 4-7 months." AI-2027 authors were like "this is the smoking gun, it shows why are model is correct!!"
They used some really sketchy metric where they asked SWEs to do a task, measured the time it took and then had the models do the task and said that the model's performance was wherever it succeeded at 50% of the tasks based on the time it took the SWEs (wtf?) and then they drew an exponential curve through it. My gut feeling is that the reason they choose 50% is because other values totally ruin the exponential curve, but I digress.
Anyways they just did the metrics for Claude 4, the first FrOnTiEr model that came out since they made their chart and... drum roll no improvement... in fact it performed worse than O3 which was first announced last December (note instead of using the date O3 was announced in 2024, they used the date where it was released months later so on their chart it make 'line go up'. A valid choice I guess, but a choice nonetheless.)
This world is a circus tent, and there still aint enough room for all these fucking clowns.
https://www.wired.com/story/openworm-worm-simulator-biology-code/
Really interesting piece about how difficult it actually is to simulate "simple" biological structures in silicon.
It's kind of telling that it's only been a couple months since that fan fic was published and there is already so much defensive posturing from the LW/EA community. I swear the people who were sharing it when it dropped and tacitly endorsing it as the vision of the future from certified prophet Daniel K are like, "oh it's directionally correct, but too aggressive" Note that we are over halfway through 2025 and the earliest prediction of agents entering the work force is already fucked. So if you are a 'super forecaster' (guru) you can do some sleight of hand now to come out against the model knowing the first goal post was already missed and the tower of conditional probabilities that rest on it is already breaking.
Funniest part is even one of authors themselves seem to be panicking too as even they can tell they are losing the crowd and is falling back on this "It's not the most likely future, it's the just the most probable." A truly meaningless statement if your goal is to guide policy since events with arbitrarily low probability density can still be the "most probable" given enough different outcomes.
Also, there's literally mass brain uploading in AI-2027. This strikes me as physically impossible in any meaningful way in the sense that the compute to model all molecular interactions in a brain would take a really, really, really big computer. But I understand if your religious beliefs and cultural convictions necessitate big snake 🐍 to upload you, then I will refrain from passing judgement.
Bummer, I wasn't on the invite list to the hottest SF wedding of 2025.
Update your mental models of Claude lads.
Because if the wife stuff isn't true, what else could Claude be lying about? The vending machine business?? The blackmail??? Being bad at Pokemon????
Bruh, there's a part where he laments that he had a hard time getting into meditation because he was paranoid that it was a form of wire heading. Beyond parody. The whole profile is 🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩
To be clear, I strongly disagree with the claim. I haven't seen any evidence that "reasoning" models actually address any of the core blocking issues- especially reliably working within a given set of constraints/being dependable enough to perform symbolic algorithms/or any serious solution to confabulations. I'm just not going to waste my time with curve pointers who want to die on the hill of NeW sCaLiNG pArAdIgM. They are just too deep in the kool-aid at this point.
My hot take has always been that current Boolean-SAT/MIP solvers are probably pretty close to theoretical optimality for problems that are interesting to humans & AI no matter how "intelligent" will struggle to meaningfully improve them. Ofc I doubt that Mr. Hollywood (or Yud for that matter) has actually spent enough time with classical optimization lore to understand this. Computer go FOOM ofc.