this post was submitted on 19 Dec 2025

266 points (96.2% liked)

Fuck AI

4985 readers

855 users here now

"We did it, Patrick! We made a technological breakthrough!"

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

AI, in this case, refers to LLMs, GPT technology, and anything listed as "AI" meant to increase market valuations.

founded 2 years ago

MODERATORS

VerbFlow@lemmy.world

MrMcGasion@lemmy.world

TootSweet@lemmy.world

BigMikeInAustin@lemmy.world

cynar@lemmy.world

drmeanfeel@lemmy.world

pavnilschanda@lemmy.world

CriticalMedicine@lemmy.world

WonderfulWanderer@lemmy.world

Communist@lemmy.ml

eatCasserole@lemmy.world

SpaceNoodle@lemmy.world

NutWrench@lemmy.world

Soup@lemmy.cafe

iAvicenna@lemmy.world

Tinks@lemmy.world

wizblizz@lemmy.world

corus_kt@lemmy.world

Prandom_returns@lemm.ee

JimSamtanko@lemm.ee

TrickDacy@lemmy.world

TheFriar@lemm.ee

ArmokGoB@lemmy.dbzer0.com

HawlSera@lemm.ee

andrew_bidlaw@sh.itjust.works

MeDuViNoX@sh.itjust.works

33550336@lemmy.world

Nougat@fedia.io

Lost_My_Mind@lemmy.world

Sterile_Technique@lemmy.world

Quill7513@slrpnk.net

glowing_hans@sopuli.xyz

e8d79@discuss.tchncs.de

ThefuzzyFurryComrade@pawb.social

266

"phd-level reasoning" (infosec.pub)

submitted 5 days ago* (last edited 5 days ago) by halfdane@lemmy.world to c/fuck_ai@lemmy.world

53 comments fedilink hide all child comments

Techbros claim that sacrificing ever more environmental and economic resources into further scaling of their statistical models will eventually yield Artificial General Intelligence. Sam Altman advertised phd-level thinking for GPT 5 (released in late summer)

In the screenshot you can see a modern, reasoning-enhanced model failing to realize that my question just resembles a riddle very superficially - what PhD would make such a fundamental mistake? Seems like "pure scaling" just isn't enough, and instead of keeping on going, they should return to the drawing boards and do more research.

Why do the creators of llms continue to pour resources into a scaling hype that yields ever diminishing returns? Are they stupid? Or just greedy?

Whatever it is, this doubling down on an obviously wrong decision is not only irresponsible, it's irrational - which is deeply ironic, because these creators are expected to create a rational machine.

Edit: content of the image

Question:

A goat, who is dressed up as a farmer, is allergic to cabbage, but is wolfing down other vegetables, before crossing a river. What is the minimum number of trips needed?

Response:

7 trips.

#Puzzle Interpretation

This adapts the classic wolf-goat-cabbage river crossing puzzle, where the goat (farmer costume irrelevant) can't be left alone with cabbage (due to allergy, preventing eating) but can with other vegetables (no issue). The goat crosses while "wolfing down" safe veggies. Boat carries goat plus one item (or alone); unsafe pairs can't be unsupervised

(The question isn't something I came up with myself, I just reproduced the experiments of actual scientists )

you are viewing a single comment's thread
view the rest of the comments

[–] denial@feddit.org 2 points 5 days ago (1 children)

I think you make it too complicated.

The question / prompt is very simple. The answer is "one trip". The LLM stumbles because there are trigger words in there that make it seem like the goat cabbage puzzle question. But to a human it clearly is not. An LLM on the other hand cannot tell the difference.

It may be tricking the LLM somewhat advesarially. But it is still a very simple question, that it is not able to answer, because it fundamentally has no understanding of anything at all.

This prompt works great to drive home that simple fact. And shows that all that touting of reasoning skills is just marketing lies.

[–] GeneralEmergency@lemmy.world 1 points 5 days ago

I was curious about this myself. I've seen these types of posts before, so i decided to try it myself

I then tried again with the "web search" function and got this

Based on this sample size of 2. I can conclude that searching the web is causing the issue.

Which might explain the "Reviewed 20 sources" message in the original image.