kogasa

joined 2 years ago
[–] kogasa@programming.dev 2 points 2 weeks ago

That example recording is awesome

[–] kogasa@programming.dev 18 points 2 weeks ago

Yippee I missed these

[–] kogasa@programming.dev 1 points 3 weeks ago (2 children)

I know, I'm just saying it's not theoretically impossible to have a phone number as a token. It's just probably not what happened here.

the choice of the next token is really random

It's not random in the sense of a uniform distribution which is what is implied by "generate a random [phone] number".

[–] kogasa@programming.dev 2 points 3 weeks ago (4 children)

A full phone number could be in the tokenizer vocabulary, but any given one probably isn't in there

[–] kogasa@programming.dev 9 points 3 weeks ago* (last edited 3 weeks ago) (6 children)

I mean the latter statement is not true at all. I'm not sure why you think this. A basic GPT model reads a sequence of tokens and predicts the next one. Any sequence of tokens is possible, and each digit 0-9 is likely its own token, as is the case in the GPT2 tokenizer.

An LLM can't generate random numbers in the sense of a proper PRNG simulating draws from a uniform distribution, the output will probably have some kind of statistical bias. But it doesn't have to produce sequences contained in the training data.

[–] kogasa@programming.dev 6 points 3 weeks ago

It's a number and complexity refers to functions. The natural inclusion of numbers into functions maps pi to the constant function x -> pi which is O(1).

If you want the time complexity of an algorithm that produces the nth digit of pi, the best known ones are something like O(n log n) with O(1) being impossible.

[–] kogasa@programming.dev 2 points 3 weeks ago (1 children)

The direct connection is cool, I just wonder if a P2P connection is actually any better than going through a data center. There's gonna be intermediate servers right?

Do you need to have Tailscale set up on any network you want to use this on? Because I'm a fan of being able to just throw my domain or IP into any TV and log in

[–] kogasa@programming.dev 2 points 3 weeks ago (3 children)

I just use nginx on a tiny Hetzner vps acting as a reverse proxy for my home server. I dunno what the point of Tailscale is here, maybe better latency and fewer network hops in some cases if a p2p connection is possible? But I've never had any bandwidth or latency issues doing this

[–] kogasa@programming.dev 13 points 3 weeks ago (2 children)

It gets around port forwarding/firewall issues that most people don't know how to deal with. But putting it behind a paywall kinda kills any chance of it being a benevolent feature.

[–] kogasa@programming.dev 14 points 3 weeks ago (1 children)

Possible reasons include:

  • fun

  • inflicting needless suffering on fish [applies if you hate fish]

[–] kogasa@programming.dev 7 points 3 weeks ago

It's got a very high barrier to entry. You kinda have to suffer through it for a while before you get it. And then you unlock a totally different kind of suffering.

[–] kogasa@programming.dev 28 points 3 weeks ago (1 children)

The last time I had fun with LLMs was back when GPT2 was cutting-edge, I fine-tuned GPT2-Medium on Twitch chat logs and it alternates between emote spam, complete incoherence, blatantly unhinged comments, and suspiciously normal ones. The bot is still in use as a toy, specifically because it's deranged and unpredictable. It's like a kaleidoscope for the slice of internet subculture it was trained on, much more fun than a plain flawless mirror.

view more: ‹ prev next ›