this post was submitted on 30 Nov 2025
36 points (84.6% liked)

Programming

23631 readers
148 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Rules

  • Follow the programming.dev instance rules
  • Keep content related to programming in some way
  • If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev



founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] TehPers@beehaw.org 9 points 12 hours ago

Why not?

Are you asking the author or people in general? If the author didn't answer "why not" for you, then I can.

Yes, I've used Claude. Let's skip that part.

If you don't know how to write or identify defensive code, you can't know if the LLM generated defensive code. So in order for a LLM to be trusted to generate defensive code, it needs to do so 100% of the time, or very close to that.

You seem to be under the impression that Claude does so, but you presumably can tell if code is written with sufficient guards and tests. You know to ask the LLM to evaluate and revise the code. Someone without experience will not know to ask that.

Speaking now from my experience, after using Claude for work to write tests, I came out of that project with no additional experience writing tests. I had to do another personal project after that to learn the testing library we used. Had that work project given me sufficient time to actually do the work, I'd have spent some time learning the testing library we used. That was unfortunately not the case.

The tests Claude generated were too rigid. It didn't test important functionality of the software. It tested exact inputs/outputs using localized output values, meaning changing localizations was potentially enough to break tests. It tested cases that didn't need to be tested, like whether certain dependency calls were done in a specific order (those calls were done in parallel anyway). It wrote some good tests, but a lot of additional tests that weren't needed, and skipped some tests that were needed.

As a tool to help someone who already knows what they're doing, it can be useful. It's not a good tool for people who don't know what they're doing.