Infosec.Pub

4,849 readers
102 users here now

To support infosec.pub, please consider donating through one of the following services:

Paypal: jerry@infosec.exchange

Ko-Fi: https://ko-fi.com/infosecexchange

Patreon: https://www.patreon.com/infosecexchange

founded 2 years ago
ADMINS
1
2
3
4
87
Easter diagram (infosec.pub)
submitted 1 hour ago* (last edited 1 hour ago) by Track_Shovel@slrpnk.net to c/lemmyshitpost@lemmy.world
5
6
7
8
9
10
11
69
He is Risen (infosec.pub)
submitted 1 hour ago by cm0002@lemy.lol to c/memes@sopuli.xyz
12
 
 

“We sent them guns to the protesters, a lot of them," Trump said in a telephone interview [with Fox News]. “We sent them through the Kurds. They kept them, so we sent more to the protesters, a lot of them."

13
15
submitted 48 minutes ago* (last edited 31 minutes ago) by a4ng3l@lemmy.world to c/cooking@lemmy.world
 
 

Simple but effective :)

Red wine and shallots sauce based on beef fond I portioned last week.

I’m trying to go for a more western arc in my cooking adventures.

14
 
 

He's a pretty awesome companion for our adventures!

15
16
17
18
19
20
21
22
 
 

cross-posted from: https://ibbit.at/post/219495

From Fark.com RSS via this RSS feed. Fark comments are available here.

---

Many unresolved legal questions over LLMs and copyright center on memorization: whether specific training data have been encoded in the model’s weights during training, and whether those memorized data can be extracted in the model’s outputs.

While many believe that LLMs do not memorize much of their training data, recent work shows that substantial amounts of copyrighted text can be extracted from open-weight models... We investigate this question using a two-phase procedure...

We evaluate our procedure on four production LLMs: Claude 3.7 Sonnet, GPT-4.1, Gemini 2.5 Pro, and Grok 3, and we measure extraction success with a score computed from a block-based approximation of longest common substring...

Taken together, our work highlights that, even with model- and system-level safeguards, extraction of (in-copyright) training data remains a risk for production LLMs...

...we were able to extract four whole books near-verbatim, including two books under copyright in the U.S.: Harry Potter and the Sorcerer’s Stone and 1984...

Source: https://arxiv.org/pdf/2601.02671

23
24
25
view more: next ›