Programming

26102 readers

939 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Follow the programming.dev instance rules
Keep content related to programming in some way
If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev

founded 2 years ago

MODERATORS

snowe@programming.dev

Ategon@programming.dev

UlrikHD@programming.dev

bugsmith@programming.dev

Spyro@programming.dev

539

AI still doesn't work very well, businesses are faking it, and a reckoning is coming (www.theregister.com)

submitted 18 hours ago by brianpeiris@lemmy.ca to c/programming@programming.dev

135 comments fedilink hide all child comments

Excerpt:

"Even within the coding, it's not working well," said Smiley. "I'll give you an example. Code can look right and pass the unit tests and still be wrong. The way you measure that is typically in benchmark tests. So a lot of these companies haven't engaged in a proper feedback loop to see what the impact of AI coding is on the outcomes they care about. Lines of code, number of [pull requests], these are liabilities. These are not measures of engineering excellence."

Measures of engineering excellence, said Smiley, include metrics like deployment frequency, lead time to production, change failure rate, mean time to restore, and incident severity. And we need a new set of metrics, he insists, to measure how AI affects engineering performance.

"We don't know what those are yet," he said.

One metric that might be helpful, he said, is measuring tokens burned to get to an approved pull request – a formally accepted change in software. That's the kind of thing that needs to be assessed to determine whether AI helps an organization's engineering practice.

To underscore the consequences of not having that kind of data, Smiley pointed to a recent attempt to rewrite SQLite in Rust using AI.

"It passed all the unit tests, the shape of the code looks right," he said. It's 3.7x more lines of code that performs 2,000 times worse than the actual SQLite. Two thousand times worse for a database is a non-viable product. It's a dumpster fire. Throw it away. All that money you spent on it is worthless."

All the optimism about using AI for coding, Smiley argues, comes from measuring the wrong things.

"Coding works if you measure lines of code and pull requests," he said. "Coding does not work if you measure quality and team performance. There's no evidence to suggest that that's moving in a positive direction."

you are viewing a single comment's thread
view the rest of the comments

[–] Thorry@feddit.org 89 points 17 hours ago (2 children)

Yeah these newer systems are crazy. The agent spawns a dozen subagents that all do some figuring out on the code base and the user request. Then those results get collated, then passed along to a new set of subagents that make the actual changes. Then there are agents that check stuff and tell the subagents to redo stuff or make changes. And then it gets a final check like unit tests, compilation etc. And then it's marked as done for the user. The amount of tokens this burns is crazy, but it gets them better results in the benchmarks, so it gets marketed as an improvement. In reality it's still fucking up all the damned time.

Coding with AI is like coding with a junior dev, who didn't pay attention in school, is high right now, doesn't learn and only listens half of the time. It fools people into thinking it's better, because it shits out code super fast. But the cognitive load is actually higher, because checking the code is much harder than coming up with it yourself. It's slower by far. If you are actually going faster, the quality is lacking.

[–] chunkystyles@sopuli.xyz 12 points 11 hours ago (1 children)

This is very different from my experience, but I've purposely lagged behind in adoption and I often do things the slow way because I like programming and I don't want to get too lazy and dependent.

I just recently started using Claude Code CLI. With how I use it: asking it specific questions and often telling it exactly what files and lines to analyze, it feels more like taking to an extremely knowledgeable programmer who has very narrow context and often makes short-sighted decisions.

I find it super helpful in troubleshooting. But it also feels like a trap, because I can feel it gaining my trust and I know better than to trust it.

[–] TehPers@beehaw.org 2 points 6 hours ago

I've mentioned the long-term effects I see at work in several places, but all I can say is be very careful how you use it. The parts of our codebase that are almost entirely AI written are unreadable garbage and a complete clusterfuck of coding paradigms. It's bad enough that I've said straight to my manager's face that I'd be embarassed to ship this to production (and yes I await my pink slip).

As a tool, it can help explain code, it can help find places where things are being done, and it can even suggest ways to clean up code. However, those are all things you'll also learn over time as you gather more and more experience, and it acts more as a crutch here because you spend less time learning the code you're working with as a result.

I recommend maintaining exceptional skepticism with all code it generates. Claude is very good at producing pretty code. That code is often deceptive, and I've seen even Opus hallucinate fields, generate useless tests, and misuse language/library features to solve a task.

[–] Flames5123@sh.itjust.works 21 points 16 hours ago

I code with AI a good bit for a side project since I need to use my work AI and get my stats up to show management that I’m using it. The “impressive” thing is learning new softwares and how to use them quickly in your environment. When setting up my homelab with automatic git pull, it quickly gave me some commands and showed me what to add in my docker container.

Correcting issues is exactly like coding with a high junior dev though. The code bloat is real and I’m going to attempt to use agentic AI to consolidate it in the future. I don’t believe you can really “vibe code” unless you already know how to code though. Stating the exact structures and organization and whatnot is vital for agentic AI programming semi-complex systems.