this post was submitted on 25 Feb 2026
895 points (99.2% liked)

Technology

81907 readers
4858 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] J92@lemmy.world 19 points 1 day ago (3 children)

The only useful thing ive found for AI is its ability to read text from an image. Which is good for taking serial numbers from a photo, and copying from an app that otherwise doesnt allow copying on phone. Thats it. A tool.

[–] bridgeenjoyer@sh.itjust.works 39 points 21 hours ago (4 children)

OCR did that for 20 years .

Nothing these slop generators do is novel or new.

[–] BastingChemina@slrpnk.net 15 points 18 hours ago

I remember using Google translate that was doing that live on the phone camera and translating the text at the same time 15 years ago.

[–] dual_sport_dork@lemmy.world 7 points 16 hours ago* (last edited 15 hours ago) (1 children)

Random aside to rant about consumer OCR.

Recently for my work I had to do some OCR stuff to get some numbers out of a document that the vendor in their infinite wisdom refused to provide in an editable/selectable form. I.e. they just slapped a .jpeg onto a page and saved it as a .pdf. (This is a separate thing that infuriates me.)

Anyway, what I'm actually here to complain about is the baffling phenomenon that every single piece of OCR software I tried ranging from open source to trials of commercial programs, to the thingy that came with one of our all-in-one printer/scanners, and everything in between is that it's somehow still exactly as crap as the lousy OCR programs we were all struggling with in the late '90s.

I have absolutely no idea how this facet of technology in particular has utterly and categorically failed to make any forward progress whatsoever in literal decades. I've personally worked on machine vision driven pick-and-place machines capable of accurately determining the orientation of densely printed cosmetics tubes, among other items, and placing them all face up in a box several times per second. Yet somehow the latest and greatest OCR transcription algorithms still can't tell a 5 from a 6 or ye gods forbid an S, or an L from a J, or an M from a collection of back and forward slashes, all despite being handed crisp high contrast seriffed text that's at least 60 pixels high.

Given the incredibly low bar for performance here given that apparently every single programmer involved just walked away circa about 2001, I can't imagine that the current slop generation machines fare any better...

[–] teuniac_@lemmy.world 4 points 16 hours ago

I have tried some of the popular LLMs a few months back when I had to digitise an old policy document from which only an old scan still existed. I had trouble reading it.

The results varied wildly. OpenAI was really poor at it while Gemini got it right completely. I was quite impressed. ABBYY FineReader is supposed to be the best non-LLM software for OCR, but it doesn't come near the performance of Gemini

[–] lolola@lemmy.blahaj.zone 5 points 19 hours ago

How else do people think we were translating all that hentai before the slop generators took off

[–] Jakeroxs@sh.itjust.works 0 points 19 hours ago (1 children)
[–] brianary@lemmy.zip 1 points 16 hours ago (1 children)

Always worked well enough for me.

[–] Jakeroxs@sh.itjust.works 2 points 16 hours ago

I remember trying to use some pre-LLM OCRs and it often got hand-writing really poorly. LLM backed seems to perform generally better, now typed OCR was usually pretty good.

[–] mrgoosmoos@lemmy.ca 12 points 1 day ago (1 children)

that function is just reskinned OCR, though

which I guess you could consider as AI and that it is a similar training data structure? not my area lol

I do also think that AI has some use as a search engine. I haven't used it much for this purpose at all, but a while back there was a specific type of engineering analysis I needed to do, and I couldn't remember the exact terms or topics to look up. chat GPT got me into the right area so I could look at the appropriate resources. in that specific scenario, it was better than a standard search engine

Of course once I found the materials I was looking for, I stopped using the chat bot and you know use those materials

[–] ricecake@sh.itjust.works 9 points 19 hours ago (1 children)

Yeah, ocr is a type of AI. The big advantage of modern techniques is that it can factor in context a bit better. It's the same principle but a different mechanism for how you know a red hexagon with S__P on it says stop, even if the sign is dented, a letter fully fell off, it's raining and dark.

It also means it's sometimes wildly inaccurate, like in cases where it's just so much more likely that it said something else. Like how on a bright sunny day, with perfect clarity, and a crisp new sign with extra good visuals, you'll hit the breaks for a sign that's a red hexagon that says §¥¢¶. It's just very unlikely that that would coincidentally be on a red hexagon near the road, so it's more likely you saw wrong and it was actually the normal thing.

[–] Hule@lemmy.world 1 points 11 hours ago

Ackshually.. Stop signs are octagons!

I also find LLMs decent for translating text between languages, though for serious use it still requires human review