Programming

26102 readers

905 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Follow the programming.dev instance rules
Keep content related to programming in some way
If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev

founded 2 years ago

MODERATORS

snowe@programming.dev

Ategon@programming.dev

UlrikHD@programming.dev

bugsmith@programming.dev

Spyro@programming.dev

531

AI still doesn't work very well, businesses are faking it, and a reckoning is coming (www.theregister.com)

submitted 18 hours ago by brianpeiris@lemmy.ca to c/programming@programming.dev

132 comments fedilink hide all child comments

Excerpt:

"Even within the coding, it's not working well," said Smiley. "I'll give you an example. Code can look right and pass the unit tests and still be wrong. The way you measure that is typically in benchmark tests. So a lot of these companies haven't engaged in a proper feedback loop to see what the impact of AI coding is on the outcomes they care about. Lines of code, number of [pull requests], these are liabilities. These are not measures of engineering excellence."

Measures of engineering excellence, said Smiley, include metrics like deployment frequency, lead time to production, change failure rate, mean time to restore, and incident severity. And we need a new set of metrics, he insists, to measure how AI affects engineering performance.

"We don't know what those are yet," he said.

One metric that might be helpful, he said, is measuring tokens burned to get to an approved pull request – a formally accepted change in software. That's the kind of thing that needs to be assessed to determine whether AI helps an organization's engineering practice.

To underscore the consequences of not having that kind of data, Smiley pointed to a recent attempt to rewrite SQLite in Rust using AI.

"It passed all the unit tests, the shape of the code looks right," he said. It's 3.7x more lines of code that performs 2,000 times worse than the actual SQLite. Two thousand times worse for a database is a non-viable product. It's a dumpster fire. Throw it away. All that money you spent on it is worthless."

All the optimism about using AI for coding, Smiley argues, comes from measuring the wrong things.

"Coding works if you measure lines of code and pull requests," he said. "Coding does not work if you measure quality and team performance. There's no evidence to suggest that that's moving in a positive direction."

you are viewing a single comment's thread
view the rest of the comments

[–] garbage_world@lemmy.world 6 points 4 hours ago (1 children)

I find this hard to believe, unless it's talking about 100% vibecoding

[–] 87Six@lemmy.zip 6 points 3 hours ago (1 children)

recent attempt to rewrite SQLite in Rust using AI

I think it is talking 100% vibe code. And yea it's pretty useful if you don't abuse it

[–] rumba@lemmy.zip 2 points 1 hour ago

Yeah, it's really good at short bursts of complicated things. Give me a curl statement to post this file as a snippet into slack. Give me a connector bot from Ollama to and from Meshtastic, it'll give you serviceable, but not perfect code.

When you get to bigger, more complicated things, it needs a lot of instruction, guard rails and architecture. You're not going to just "Give me SQLite but in Rust, GO" and have a good time.

I've seen some people architect some crazy shit. You do this big long drawn out project, tell it to use a small control orchestrator, set up many agents and have each agent do part of the work, have it create full unit tests, be demanding about best practice, post security checks, oroborus it and let it go.

But it's expensive, and we're still getting venture capital tokens for less than cost, and you'll still have hard-to-find edge cases. Someone may eventually work out a fairly generic way to set it up to do medium scale projects cleanly, but it's not now and there are definite limits to what it can handle. And as always, you'll never be able to trust that it's making a safe app.