Technology

77656 readers

3289 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

917

AI agents wrong ~70% of time: Carnegie Mellon study (www.theregister.com)

submitted 5 months ago by eli001@lemmy.world to c/technology@lemmy.world

192 comments fedilink hide all child comments

(page 2) 50 comments

sorted by: hot top controversial new old

[–] kinsnik@lemmy.world 7 points 5 months ago

I haven't used AI agents yet, but my job is kinda pushing for them. but i have used the google one that creates audio podcasts, just to play around, since my coworkers were using it to "learn" new things. i feed it with some of my own writing and created the podcast. it was fun, it was an audio overview of what i wrote. about 80% was cool analysis, but 20% was straight out of nowhere bullshit (which i know because I wrote the original texts that the audio was talking about). i can't believe that people are using this for subjects that they have no knowledge. it is a fun toy for a few minutes (which is not worth the cost to the environment anyway)

[–] Frenezul0_o@lemmy.world 7 points 5 months ago

I notice that the research didn't include DeepSeek. It would have been nice to see how it compares.

[–] gargle@lemmy.world 5 points 5 months ago

I asked Claude 3.5 Haiku to write me a quine in COBOL in the bs2000 dialect. Claude does now that creating a perfect quine in COBOL is challenging due to the need to represent the self-referential nature of the code. After a few suggestions Claude restated its first draft, without proper BS2000 incantations, without a perform statement, and without any self-referential redefines. It's a lot of work. I stopped caring and moved on.

For those who wonder: https://sourceforge.net/p/gnucobol/discussion/lounge/thread/495d8008/ has an example.

Colour me unimpressed. I dread the day when they force the use of 'AI' on us at work.

[–] brown567@sh.itjust.works 5 points 5 months ago

70% seems pretty optimistic based on my experience...

[–] vane@lemmy.world 5 points 5 months ago

Reading with CEO mindset. 3 out of 10 employees can be fired.

[–] Affidavit@lemmy.world 5 points 5 months ago (1 children)

"...for multi-step tasks"

[–] loonsun@sh.itjust.works 5 points 5 months ago (1 children)

It's about Agents, which implies multi step as those are meant to execute a series of tasks opposed to studies looking at base LLM model performance.

load more comments (1 replies)

[–] iopq@lemmy.world 3 points 5 months ago

Now I'm curious, what's the average score for humans?

load more comments