suy

joined 2 years ago
[–] suy@programming.dev -3 points 8 months ago (3 children)

But then it does go on to quote materials verbatim, which shows it’s not “just” ‘extracting patterns’.

Is is just extracting patterns. Is making statistical samples of which token ("word", informally speaking) is likely followed given the previous stream.

It can only reproduce passages of things it has seen many, many times. I cannot reproduce the whole work. Those two quotes can be seen elsewhere on the internet plenty of times. And it's fair use there, so it would be fair use with a chat bot as well.

There have been papers published where researchers were able to regenerate an image that was present in the training set of Stable Diffusion. But they were only able to find that image (and others) in particular, because they were present in the training set multiple times, and the caption was the same (it was the portrait picture of some executive at a company).

when given the book and pages — quote copyrighted works

Yeah, you are not gonna be able to do that with an LLM. They will be able to quote only some passages, and only of popular books that have been quoted often enough.

Even if they started to use my service to literally copy entire books?

You cannot do that with an LLM.

Why are you defending massive corporations who could just pay up? Isn’t the whole “corporations putting profits over anything” thing a bit… seen already?

I hate that some corporations are burning money, resources and energy on this, and the solution is not to restrict fair use even further. Machine Learning is complex, but if I had to summarize in some way is "just" gathering statistics of which word comes next (in the case of a text model). This is no different than getting a large corpus of text, and sample it for word frequency, letter frequency, N-gram frequency, etc. It is well known that this is fair use. You only store the copyrighted works to run the software and produce a very transformative work that is a summary many orders of magnitude smaller than the copyrighted work. This is fair use, and it should still be. Changing that is gonna harm the public, small companies and independent researchers way more than big tech companies.

As I said in another comment, I would very much welcome a way to force big corpos to release their models. Make a model bigger than N parameters? You needed too much fair use in one gulp: your model has to be public, and in the public domain. I would fucking welcome that! But going in the opposite direction is just risky.

I don't understand why small individuals think that copyright is their friend, and will protect them from big tech companies. Copyright will always harm the weak and protect the powerful as a net result. It's already a miracle that we can enjoy free software and culture by licenses that leverage copyright in our favor.

[–] suy@programming.dev 8 points 8 months ago

"Theft" is never a technically accurate word when dealing with the so called "intellectual property", because the digital content being copied without authorization is legal in tons of cases, and because, come on, property is very explicitly exclusive. I cannot copy my house or my car, but I can make copies of my works for virtually 0 cost.

Using data for training ML models is even explicitly allowed in some jurisdictions (e.g. Japan), and is likely to be fair use everywhere else. LLMs are very transformative, and while they often can produce verbatim copies of fragments of copyrighted works, they don't store the whole works or significant pieces of them.

Don't get me wrong, I don't like big companies making big money. I would not mind a law that would force models to be open sourced. But restricting them to train their models on public data by restricting fair use, it would harm them very little (they could pay something if they are making some profit), while small researchers or companies would never be able to compete, because they would not have the upfront costs, nor the economic engineering to disguise profits and pay less.

[–] suy@programming.dev 4 points 11 months ago

Are you aware of the work of Douglas Crockford?

[–] suy@programming.dev 2 points 11 months ago (1 children)

FWIW, cppfront would be the same, IMHO. It allows C++ syntax, and it just passes it through verbatim. Only transforms "syntax 2" into today's C++. And Herb Sutter very much says that what it does is based on the papers that he's presented for standardization, and that he'd like this approach (new syntax) land into today's C++ compilers and the standard.

cppfront is the only one that I thought had a chance till recently. The presentations from Sean Baxter seem to finally make the community see it on a positive light (I've seen posts on Reddit being removed on the premise of not being C++, which I think it's a bit unfair), so that's good.

[–] suy@programming.dev 2 points 1 year ago

I have to admit that I never understood the need for bashrc and bash_profile. I hated that with a passion when I started to set up my bash configuration. I never saw the need to have so many files and so much complication to have a consistent shell whenever I logged in the console or spawned a konsole in KDE.

The paths shown on that diagram are 7 for bash, and 4 for zsh, so it's surely an improvement. However, now that I have set it all on a git repository, I don't see it as a big deal. I have a profile that sources bashrc, and then I do it all in bashrc. I've checked /etc/skel and it seems the distro does roughly the same (and I've never switched away from Debian or Debian-based in 20 years). I'm not sure if it's such a big deal. But I'm still curious about trying zsh some day. :)

Thanks for the blog post. I'll check it out.

[–] suy@programming.dev 1 points 1 year ago (5 children)

Also preemptively deciding that me disagreeing with you automatically makes you right because you predicted your explanation wouldn’t satisfy me is just A-tier bullshit.

I predicted that I would waste my time by replying to you, and I predicted right.

I wanted to give it a chance, though, because Lemmy is a place that is friendly enough and that I want to thrive, despite how little I contribute. I tried to be constructive and explain things the best I could, and assume the best possible faith, etc. When you just say that I sound like an asshole, and completely act in bad faith in how russian roulette is supposed to be in the context of someone who says "you can beat me at any game", now I feel the urge to try the block feature in Lemmy, sorry.

[–] suy@programming.dev 6 points 1 year ago (13 children)

Two people go on a date. The date is going well, there is chemistry between the two people. One says "if you beat me at any game we can have sex". The two people will typically play a board or card game, and will flirt with the opportunity of sex during the game play, which is gonna be fun and exciting. Seems a good plot idea for your average romantic comedy movie or teenager's series.

Now the joke is that the choice of game is stupid because you end up killing your date. Just with that you could make a meme/joke. Now the post is doubling down on the stupidity, insanity, etc., by making it morbid and showing that the guy still had sex with the corpse.

Here it is. My take on the issue, which is unlikely to be the only possible explanation which is not "incel shit". I've wasted 10 minutes of my time, and you'll likely will still not agree with me, and will prove valid my first comment.

Cheers.

[–] suy@programming.dev 13 points 1 year ago (18 children)

Has it occurred to you that pressing the downvote button is just much easier that having to bother explaining something that should be obvious?

If it is not obvious to you that it's not incel shit, maybe even after an explanation you won't agree still because you have different views (which I'm not saying are not respectable, but are still different, so an agreement can't be reached), so whoever replies to you would have wasted their time.

So of course people downvote without replying.

[–] suy@programming.dev 18 points 1 year ago (2 children)

Yes. There is already an answer with many votes saying so, but I'll add myself to the list.

I don't have to like all the language, and not even all of the standard library. I learnt C++ with the Qt library, and I still do 99% of my development using Qt because it's the kind of software that I like to write the most. I can choose the parts that I like the most about the full C++ ecosystem, like most people do (you would have to see how different game development is, for example).

I'm also learning Rust, and I see nothing wrong with it. It's just that I see C++ better for the kind of stuff that I need to write (at this time at least).

[–] suy@programming.dev 6 points 1 year ago

Correct. Backwards compatibility is both its biggest asset and its bigger problem.

In syntax alone, you can check what Herb Sutter is doing with cppfront. Specifically, the wiki page on the postfix operators is quite enlightening. It shows some interesting examples of how by making everything a postfix operator you drop the need of -> and the duality of pre/post increment and decrement operators.

[–] suy@programming.dev 6 points 1 year ago (5 children)

Excuse me, what?

[–] suy@programming.dev 3 points 1 year ago (1 children)

Klipper was entirely a different program, process, etc. that was using the system tray. Nowadays it seems to be a plasmoid in the system tray. How can that be less of a UNIX philosophy than the Windows alternative? Because it's developed by the same community that makes the shell? That doesn't make sense to me.

view more: ‹ prev next ›