this post was submitted on 17 Mar 2026

487 points (98.0% liked)

Programming

26102 readers

787 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Follow the programming.dev instance rules
Keep content related to programming in some way
If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev

founded 2 years ago

MODERATORS

snowe@programming.dev

Ategon@programming.dev

UlrikHD@programming.dev

bugsmith@programming.dev

Spyro@programming.dev

487

AI still doesn't work very well, businesses are faking it, and a reckoning is coming (www.theregister.com)

submitted 15 hours ago by brianpeiris@lemmy.ca to c/programming@programming.dev

119 comments fedilink hide all child comments

Excerpt:

"Even within the coding, it's not working well," said Smiley. "I'll give you an example. Code can look right and pass the unit tests and still be wrong. The way you measure that is typically in benchmark tests. So a lot of these companies haven't engaged in a proper feedback loop to see what the impact of AI coding is on the outcomes they care about. Lines of code, number of [pull requests], these are liabilities. These are not measures of engineering excellence."

Measures of engineering excellence, said Smiley, include metrics like deployment frequency, lead time to production, change failure rate, mean time to restore, and incident severity. And we need a new set of metrics, he insists, to measure how AI affects engineering performance.

"We don't know what those are yet," he said.

One metric that might be helpful, he said, is measuring tokens burned to get to an approved pull request – a formally accepted change in software. That's the kind of thing that needs to be assessed to determine whether AI helps an organization's engineering practice.

To underscore the consequences of not having that kind of data, Smiley pointed to a recent attempt to rewrite SQLite in Rust using AI.

"It passed all the unit tests, the shape of the code looks right," he said. It's 3.7x more lines of code that performs 2,000 times worse than the actual SQLite. Two thousand times worse for a database is a non-viable product. It's a dumpster fire. Throw it away. All that money you spent on it is worthless."

All the optimism about using AI for coding, Smiley argues, comes from measuring the wrong things.

"Coding works if you measure lines of code and pull requests," he said. "Coding does not work if you measure quality and team performance. There's no evidence to suggest that that's moving in a positive direction."

top 50 comments

sorted by: hot top controversial new old

[–] garbage_world@lemmy.world 4 points 1 hour ago (1 children)

I find this hard to believe, unless it's talking about 100% vibecoding

[–] 87Six@lemmy.zip 4 points 1 hour ago

recent attempt to rewrite SQLite in Rust using AI

I think it is talking 100% vibe code. And yea it's pretty useful if you don't abuse it

[–] python@lemmy.world 18 points 3 hours ago (3 children)

Recently had to call out a coworker for vibecoding all her unit tests. How did I know they were vibe coded? None of the tests had an assertion, so they literally couldn't fail.

[–] ch00f@lemmy.world 8 points 2 hours ago (1 children)

Vibe coding guy wrote unit tests for our embedded project. Of course, the hardware peripherals aren’t available for unit tests on the dev machine/build server, so you sometimes have to write mock versions (like an “adc” function that just returns predetermined values in the format of the real analog-digital converter).

Claude wrote the tests and mock hardware so well that it forgot to include any actual code from the project. The test cases were just testing the mock hardware.

[–] 87Six@lemmy.zip 3 points 2 hours ago

Not realizing that should be an instant firing. The dev didn't even glance a look at the unit tests...

[–] fallaciousBasis@lemmy.world 6 points 3 hours ago

Hahaha 🤣

[–] nutsack@lemmy.dbzer0.com 4 points 3 hours ago

if you reject her pull requests, does she fix it? is there a way for management to see when an employee is pushing bad commits more frequently than usual?

[–] roserose56@lemmy.zip 1 points 1 hour ago (1 children)

Yes it does not work right! also there are no new discoveries made by AI, we only see chat bots, self driving cars, automation in workplace, yet no discoveries. At some point I thought AI will help us solve cancer or way to travel in space, yet billionaires think of money.
Tell me that negative, tell that an idiot, but the only thing I see people profiting now that they can, and letter on nothing will happen.

[–] DerHans@lemmy.world 1 points 1 minute ago

Yes it does not work right!

I agree.

also there are no new discoveries made by AI, we only see chat bots, self driving cars, automation in workplace, yet no discoveries. At some point I thought AI will help us solve cancer or way to travel in space, yet billionaires think of money.

We aren't there yet. AI and research around it started, or rather really took off, around 2018 (at least relating to what we mean by AI today; ruled based approaches existed much longer). It is very much a new field, considering most other fields existed for over 30 years at this point. And well, to be pedantic, large language models aren't really AI because there is no intelligence. They are just generating output that is the most probable continuation of the input and context provided. So yeah, "AI" cannot really research or make new discoveries yet. There may very well be a time, where AI helps us solve cancer. It definitely isn't today nor tomorrow.

I also don't think that billionaries make money with AI. I mean, if you look at OpenAI: they are actually burning money, at a fast rate measured in billions. They are believed to turn a profit in 2030. Without others investing in it, they would be long gone already. The people with money believe that OpenAI and other companies related to AI will someday make the world changing discovery. That could very well lead to AI making discoveries on its own. Until then, they are obviously willing to burn a tremendous amount of money and that is keeping OpenAI in particular alive at this moment. Only time will tell what happens next. I keep my popcorn ready, once the bubble bursts :D

[–] Avicenna@programming.dev 1 points 1 hour ago

"Codestrap founders"

https://www.codestrap.com/

Let me guess they will spearhead the correct way to use AI?

[–] Not_mikey@lemmy.dbzer0.com 16 points 4 hours ago* (last edited 4 hours ago) (1 children)

Guy selling ai coding platform says other AI coding platforms suck.

This just reads like a sales pitch rather than journalism. Not citing any studies just some anecdotes about what he hears "in the industry".

Half of it is:

You're measuring the wrong metrics for productivity, you should be using these new metrics that my AI coding platform does better on.

I know the AI hate is strong here but just because a company isn't pushing AI in the typical way doesn't mean they aren't trying to hype whatever they're selling up beyond reason. Nearly any tech CEO cannot be trusted, including this guy, because they're always trying to act like they can predict and make the future when they probably can't.

[–] yabbadabaddon@lemmy.zip 3 points 4 hours ago

My take exactly. Especially the bits about unit tests. If you cannot rely on your unit tests as a first assessment of your code quality, your unit tests are trash.

And not every company runs GitHub. The metrics he's talking about are DevOps metrics and not development metrics. For example In my work, nobody gives a fuck about mean time to production. We have a planning schedule and we need the ok from our customers before we can update our product.

[–] raven@lemmy.org 4 points 3 hours ago (2 children)

I once saw someone sending ChatGPT and Gemini Pro in a constant loop by asking "Is seahorse emoji real?". That responses were in a constant loop. I have heard that the theory of "Mandela Effect" in this case is not true. They say that the emoji existed on Microsoft's MSN messenger and early stages of Skype. Don't know how much of it is true. But it was fun seeing artificial intelligence being bamboozled by real intelligence. The guy was proving that AI is just a tool, not a permanent replacement of actual resources.

[–] ch00f@lemmy.world 3 points 2 hours ago (1 children)

Ask it which is heavier: 20 pounds of gold or 20 feathers.

[–] portuga@lemmy.world 1 points 2 hours ago

They could be dinosaur feathers, weighting a pound each 🤷‍♂️

[–] lb_o@lemmy.world 2 points 3 hours ago* (last edited 3 hours ago)

That was working the same in gpt one month ago. Do it, it is incredibly fun to see yourself.

[–] BilSabab@lemmy.world 1 points 2 hours ago

it sure works well for slop marketers taking A/B testing to the new level of pointlessness.

[–] nutsack@lemmy.dbzer0.com 3 points 3 hours ago* (last edited 3 hours ago) (2 children)

these types of articles aren't analyzing the usefulness of the tool in good faith. they're not meant to do a lot of the things that are often implied. the coding tools are best used by coders who can understand code and make decisions about what to do with the code that comes out of the tool. you don't need ai to help you be a shitty programmer

[–] lime@feddit.nu 1 points 2 hours ago

they are analyzing the way the tools are being used based on marketing. yes they're useful for senior programmers who need to automate boilerplate, but they're sold as complete solutions.

[–] dependencyinjection@discuss.tchncs.de 1 points 3 hours ago* (last edited 2 hours ago)

Exactly. This reads like people are prompting for something then just using that code.

The way we use it is as a scaffolding tool. Write a prompt. Then use that boiler plate to actually solve the problem you’re trying to solve.

You could say the same for people using Stackoverflow, you don’t just blindly copy and paste.

[–] magiccupcake@lemmy.world 26 points 7 hours ago

I love this bit especially

Insurers, he said, are already lobbying state-level insurance regulators to win a carve-out in business insurance liability policies so they are not obligated to cover AI-related workflows. "That kills the whole system," Deeks said. Smiley added: "The question here is if it's all so great, why are the insurance underwriters going to great lengths to prohibit coverage for these things? They're generally pretty good at risk profiling."

[–] jimmux@programming.dev 38 points 11 hours ago (1 children)

We never figured out good software productivity metrics, and now we're supposed to come up with AI effectiveness metrics? Good luck with that.

[–] Senal@programming.dev 11 points 6 hours ago (1 children)

Sure we did.

"Lines Of Code" is a good one, more code = more work so it must be good.

I recently had a run in with another good one : PR's/Dev/Month.

Not only it that one good for overall productivity, it's a way to weed out those unproductive devs who check in less often.

This one was so good, management decided to add it to the company wide catchup slides in a section espousing how the new AI driven systems brought this number up enough to be above other companies.

That means other companies are using it as well, so it must be good.

[–] SaharaMaleikuhm@feddit.org 5 points 5 hours ago (2 children)

Why is it always the dumbest people who become managers?

[–] Senal@programming.dev 2 points 4 hours ago

The Peter Principle

[–] yabbadabaddon@lemmy.zip 1 points 4 hours ago

The others are busy working, they don't have time to waste drinking coffee with execs

[–] Malgas@beehaw.org 10 points 9 hours ago

This feels like an exercise in Goodhart's Law: Any measure that becomes a target ceases to be a useful measure.

[–] Thorry@feddit.org 83 points 14 hours ago (2 children)

Yeah these newer systems are crazy. The agent spawns a dozen subagents that all do some figuring out on the code base and the user request. Then those results get collated, then passed along to a new set of subagents that make the actual changes. Then there are agents that check stuff and tell the subagents to redo stuff or make changes. And then it gets a final check like unit tests, compilation etc. And then it's marked as done for the user. The amount of tokens this burns is crazy, but it gets them better results in the benchmarks, so it gets marketed as an improvement. In reality it's still fucking up all the damned time.

Coding with AI is like coding with a junior dev, who didn't pay attention in school, is high right now, doesn't learn and only listens half of the time. It fools people into thinking it's better, because it shits out code super fast. But the cognitive load is actually higher, because checking the code is much harder than coming up with it yourself. It's slower by far. If you are actually going faster, the quality is lacking.

[–] chunkystyles@sopuli.xyz 10 points 8 hours ago (1 children)

This is very different from my experience, but I've purposely lagged behind in adoption and I often do things the slow way because I like programming and I don't want to get too lazy and dependent.

I just recently started using Claude Code CLI. With how I use it: asking it specific questions and often telling it exactly what files and lines to analyze, it feels more like taking to an extremely knowledgeable programmer who has very narrow context and often makes short-sighted decisions.

I find it super helpful in troubleshooting. But it also feels like a trap, because I can feel it gaining my trust and I know better than to trust it.

[–] TehPers@beehaw.org 1 points 3 hours ago

I've mentioned the long-term effects I see at work in several places, but all I can say is be very careful how you use it. The parts of our codebase that are almost entirely AI written are unreadable garbage and a complete clusterfuck of coding paradigms. It's bad enough that I've said straight to my manager's face that I'd be embarassed to ship this to production (and yes I await my pink slip).

As a tool, it can help explain code, it can help find places where things are being done, and it can even suggest ways to clean up code. However, those are all things you'll also learn over time as you gather more and more experience, and it acts more as a crutch here because you spend less time learning the code you're working with as a result.

I recommend maintaining exceptional skepticism with all code it generates. Claude is very good at producing pretty code. That code is often deceptive, and I've seen even Opus hallucinate fields, generate useless tests, and misuse language/library features to solve a task.

load more comments (1 replies)

[–] DickFiasco@sh.itjust.works 66 points 13 hours ago (6 children)

AI is a solution in search of a problem. Why else would there be consultants to "help shepherd organizations towards an AI strategy"? Companies are looking to use AI out of fear of missing out, not because they need it.

[–] Saledovil@sh.itjust.works 2 points 4 hours ago

The problem is that code is hard to write. AI just doesn't solve it. This is opposite of crypto, where the product is sort of good at what it does, (not bitcoin, though), but we don't actually need to do that.

[–] ultimate_worrier@lemmy.dbzer0.com 31 points 13 hours ago* (last edited 13 hours ago)

Exactly. I’ve heard the phrase “falling behind” from many in upper management.

load more comments (4 replies)

[–] luciole@beehaw.org 34 points 13 hours ago (2 children)

This is all fine and dandy but the whole article is based on an interview with "Dorian Smiley, co-founder and CTO of AI advisory service Codestrap". Codestrap is a Palantir service provider, and as you'd expect Smiley is a Palantir shill.

The article hits different considering it's more or less a world devourer zealot taking a jab at competing world devourers. The reporter is an unsuspecting proxy at best.

load more comments (2 replies)

[–] CubitOom 49 points 14 hours ago

Generative models, which many people call "AI", have a much higher catastrophic failure rate than we have been lead to believe. It cannot actually be used to replace humans, just as an inanimate object can't replace a parent.

Jobs aren't threatened by generative models. Jobs are threatened by a credit crunch due to high interest rates and a lack of lenders being able to adapt.

"AI" is a ruse, a useful excuse that helps make people want to invest, investors & economists OK with record job loss, and the general public more susceptible to data harvesting and surveillance.

load more comments