this post was submitted on 10 Sep 2025
912 points (99.2% liked)

Fuck AI

4230 readers
702 users here now

"We did it, Patrick! We made a technological breakthrough!"

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] ech@lemmy.ca 40 points 3 weeks ago* (last edited 3 weeks ago) (3 children)

Took a look cause, as frustrating as it'd be, it'd still be a step in the right direction. But no, they're still adamant that it's just a "quirk".

Conclusions

We hope that the statistical lens in our paper clarifies the nature of hallucinations and pushes back on common misconceptions:

Claim: Hallucinations will be eliminated by improving accuracy because a 100% accurate model never hallucinates. Finding: Accuracy will never reach 100% because, regardless of model size, search and reasoning capabilities, some real-world questions are inherently unanswerable.

Claim: Hallucinations are inevitable. Finding: They are not, because language models can abstain when uncertain.

Claim: Avoiding hallucinations requires a degree of intelligence which is exclusively achievable with larger models. Finding: It can be easier for a small model to know its limits. For example, when asked to answer a Māori question, a small model which knows no Māori can simply say “I don’t know” whereas a model that knows some Māori has to determine its confidence. As discussed in the paper, being “calibrated” requires much less computation than being accurate.

Claim: Hallucinations are a mysterious glitch in modern language models. Finding: We understand the statistical mechanisms through which hallucinations arise and are rewarded in evaluations.

Claim: To measure hallucinations, we just need a good hallucination eval. Finding: Hallucination evals have been published. However, a good hallucination eval has little effect against hundreds of traditional accuracy-based evals that penalize humility and reward guessing. Instead, all of the primary eval metrics need to be reworked to reward expressions of uncertainty.

Infuriating.

[–] wewbull@feddit.uk 7 points 3 weeks ago* (last edited 3 weeks ago) (1 children)

Finding: Accuracy will never reach 100% because, regardless of model size, search and reasoning capabilities, some real-world questions are inherently unanswerable.

Translation: PEBKAC. You asked the wrong question.

[–] jherazob@fedia.io 6 points 3 weeks ago (1 children)

Basically "You must be prompting it wrong!"

[–] fading_person@lemmy.zip 1 points 3 weeks ago

We don't have mathematical proof that this technology will find adequate results. You must have faith in the technology.

[–] Voroxpete@sh.itjust.works 2 points 3 weeks ago

Got a link or a title I can google to find the full paper? I'd be really interested in reading it.

[–] lets_get_off_lemmy@reddthat.com 0 points 3 weeks ago* (last edited 3 weeks ago)

This further points to the solution being smaller models that know less and are trained for smaller tasks. Instead of gargantuan models that require an insane amount of resources to answer easy questions. Route queries to smaller, more specialized models, based on queries. This was the motivation behind MoE models, but I think there are other architectures and paradigms to explore.