this post was submitted on 31 Aug 2023
557 points (97.9% liked)

Technology

70711 readers
3385 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
 

I'm rather curious to see how the EU's privacy laws are going to handle this.

(Original article is from Fortune, but Yahoo Finance doesn't have a paywall)

(page 2) 50 comments
sorted by: hot top controversial new old
[–] SomethingBurger@jlai.lu 5 points 2 years ago (15 children)

Can't they remove the data from the training set and start over?

[–] mo_ztt@lemmy.world 4 points 2 years ago (1 children)

Yes, but that's not easy... I can't remember exactly, but I think I saw an estimate that the compute time to train just one of the GPT models cost around $66 million. IDK whether that's total cost from scratch, or incremental cost to arrive at that model starting from an earlier model that was already built, but I do know that GPT is still to this day using that September 2021 cutoff which to me kind of implies that they're building progressively on top of already-assembled models and datasets (which makes sense, because to start from scratch without needing to would be insane).

You could, technically, start from scratch and spend 2 more years and however many million dollars retraining a new model that doesn't have the private data you're trying to excise, but I think the point the article is making is that that's a pretty difficult approach and it seems right now like that's the only way.

[–] skulblaka@kbin.social 5 points 2 years ago

Un-robbing a bank also isn't easy, but that doesn't mean I'm able to just say "it too hard :c" and then walk off into the sunset with my looted gains.

[–] Zeth0s@lemmy.world 2 points 2 years ago* (last edited 2 years ago)

Information leaking is a thing. Some information is spread across multiple sources without actually being in any of those. If you remove something, the model can still infer the information.

If macron asks for his name to be deleted, you can retrieve his political opinion by simply knowing the history of interactions of other people with the French government. I just need to tell the model that the person he has no direct information about is named macron, and he can profile him.

Same with the search engine. The only difference is that the inference of missing information now is done by human brains. The model can substitute them

[–] Hildegarde@lemmy.world 2 points 2 years ago (1 children)

Yes. They can also reload a backup from before the data in question was added to the training data and retrain from that point. This is also what will need to be done if AI companies lose their copyright lawsuits.

None of this is impossible. Its just expensive. And these are expenses that AI companies could have avoided if they picked their datasets more carefully.

[–] assassin_aragorn@lemmy.world 2 points 2 years ago (1 children)

It's crazy that they aren't taking at least daily captures of the model nor having it record what information it processes.

[–] Hildegarde@lemmy.world 3 points 2 years ago

I would be shocked if they don't. It's pretty critical for any software development, AI or not, to retain the ability to roll back changes in the case any change breaks something.

load more comments (12 replies)
[–] cloudless@feddit.uk 5 points 2 years ago

It is not impossible, it is just expensive.

[–] asunaspersonalasst@lemmy.world 5 points 2 years ago

Then why they put it in in the first place no? 👁👄👁

[–] over_clox@lemmy.world 3 points 2 years ago

Have you tried..

format Earth

load more comments
view more: ‹ prev next ›