this post was submitted on 07 Jul 2025
917 points (98.1% liked)

Technology

77656 readers
3289 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
(page 2) 50 comments
sorted by: hot top controversial new old
[–] kinsnik@lemmy.world 7 points 5 months ago

I haven't used AI agents yet, but my job is kinda pushing for them. but i have used the google one that creates audio podcasts, just to play around, since my coworkers were using it to "learn" new things. i feed it with some of my own writing and created the podcast. it was fun, it was an audio overview of what i wrote. about 80% was cool analysis, but 20% was straight out of nowhere bullshit (which i know because I wrote the original texts that the audio was talking about). i can't believe that people are using this for subjects that they have no knowledge. it is a fun toy for a few minutes (which is not worth the cost to the environment anyway)

[–] Frenezul0_o@lemmy.world 7 points 5 months ago

I notice that the research didn't include DeepSeek. It would have been nice to see how it compares.

[–] gargle@lemmy.world 5 points 5 months ago

I asked Claude 3.5 Haiku to write me a quine in COBOL in the bs2000 dialect. Claude does now that creating a perfect quine in COBOL is challenging due to the need to represent the self-referential nature of the code. After a few suggestions Claude restated its first draft, without proper BS2000 incantations, without a perform statement, and without any self-referential redefines. It's a lot of work. I stopped caring and moved on.

For those who wonder: https://sourceforge.net/p/gnucobol/discussion/lounge/thread/495d8008/ has an example.

Colour me unimpressed. I dread the day when they force the use of 'AI' on us at work.

[–] brown567@sh.itjust.works 5 points 5 months ago

70% seems pretty optimistic based on my experience...

[–] vane@lemmy.world 5 points 5 months ago

Reading with CEO mindset. 3 out of 10 employees can be fired.

[–] Affidavit@lemmy.world 5 points 5 months ago (1 children)
[–] loonsun@sh.itjust.works 5 points 5 months ago (1 children)

It's about Agents, which implies multi step as those are meant to execute a series of tasks opposed to studies looking at base LLM model performance.

load more comments (1 replies)
[–] iopq@lemmy.world 3 points 5 months ago

Now I'm curious, what's the average score for humans?

load more comments
view more: ‹ prev next ›