this post was submitted on 07 Jul 2025

915 points (98.1% liked)

Technology

72784 readers

2666 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

915

AI agents wrong ~70% of time: Carnegie Mellon study (www.theregister.com)

submitted 1 week ago by eli001@lemmy.world to c/technology@lemmy.world

196 comments fedilink hide all child comments

(page 3) 50 comments

sorted by: hot top controversial new old

[–] floofloof@lemmy.ca 18 points 1 week ago* (last edited 1 week ago)

"Gartner estimates only about 130 of the thousands of agentic AI vendors are real."

This whole industry is so full of hype and scams, the bubble surely has to burst at some point soon.

[–] ApeNo1@lemmy.world 16 points 1 week ago (1 children)

They've done studies, you know. 30% of the time, it works every time.

[–] MangoCats@feddit.it 7 points 1 week ago

I ask AI to write simple little programs. One time in three they actually compile without errors. To the credit of the AI, I can feed it the error and about half the time it will fix it. Then, when it compiles and runs without crashing, about one time in three it will actually do what I wanted. To the credit of AI, I can give it revised instructions and about half the time it can fix the program to work as intended.

So, yeah, a lot like interns.

[–] lepinkainen@lemmy.world 10 points 1 week ago (7 children)

Wrong 70% doing what?

I’ve used LLMs as a Stack Overflow / MSDN replacement for over a year and if they fucked up 7/10 questions I’d stop.

Same with code, any free model can easily generate simple scripts and utilities with maybe 10% error rate, definitely not 70%

load more comments (7 replies)

[–] FenderStratocaster@lemmy.world 9 points 1 week ago

I tried to order food at Taco Bell drive through the other day and they had an AI thing taking your order. I was so frustrated that I couldn't order something that was on the menu I just drove to the window instead. The guy that worked there was more interested in lecturing me on how I need to order. I just said forget it and drove off.

If you want to use AI, I'm not going to use your services or products unless I'm forced to. Looking at you Xfinity.

[–] fossilesque@mander.xyz 8 points 1 week ago (1 children)

Agents work better when you include that the accuracy of the work is life or death for some reason. I've made a little script that gives me bibtex for a folder of pdfs and this is how I got it to be usable.

[–] HertzDentalBar@lemmy.blahaj.zone 3 points 1 week ago (1 children)

Did you make it? Or did you prompt it? They ain't quite the same.

load more comments (1 replies)

[–] kinsnik@lemmy.world 7 points 1 week ago

I haven't used AI agents yet, but my job is kinda pushing for them. but i have used the google one that creates audio podcasts, just to play around, since my coworkers were using it to "learn" new things. i feed it with some of my own writing and created the podcast. it was fun, it was an audio overview of what i wrote. about 80% was cool analysis, but 20% was straight out of nowhere bullshit (which i know because I wrote the original texts that the audio was talking about). i can't believe that people are using this for subjects that they have no knowledge. it is a fun toy for a few minutes (which is not worth the cost to the environment anyway)

[–] brown567@sh.itjust.works 5 points 1 week ago

70% seems pretty optimistic based on my experience...

[–] Affidavit@lemmy.world 5 points 1 week ago (1 children)

"...for multi-step tasks"

[–] loonsun@sh.itjust.works 5 points 1 week ago (1 children)

It's about Agents, which implies multi step as those are meant to execute a series of tasks opposed to studies looking at base LLM model performance.

load more comments (1 replies)

[–] burgerpocalyse@lemmy.world 3 points 1 week ago

I dont know why but I am reminded of this clip about eggless omelette https://youtu.be/9Ah4tW-k8Ao

load more comments