this post was submitted on 12 May 2025
609 points (98.9% liked)

Just Post

984 readers
275 users here now

Just post something ๐Ÿ’›

founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[โ€“] kromem@lemmy.world 1 points 2 months ago* (last edited 2 months ago)

The attention mechanism working this way was at odds with the common wisdom across all frontier researchers.

Yes, the final step of the network is producing the next token.

But the fact that intermediate steps have now been shown to be planning and targeting specific future results is a much bigger deal than you seem to be appreciating.

If I ask you to play chess and you play only one move ahead vs planning n moves ahead, you are going to be playing very different games. Even if in both cases you are only making one immediate next move at a time.