Haha, thanks for the good chuckle. I read this before proceeding to the article.
You can tell it's not our intrepid card player's first rodeo, though! The dress melting technique was adjusted to leave the perfect slot for a pendant. That had to be a tweak from a later refinement.
There's a lot going on with the shapes in that image. It seems like a good example of how the algorithms parse a scene.
As a start, you could take a look at Ollama, which seems to be available in many package managers if you use one. I've done some experimenting with mistral-nemo, but you should pick a model size appropriate to your hardware and use case. I believe there are GUIs and extensions for Ollama, but as someone with a low interest in LLMs, I've only used the bare bones features through my terminal, and I haven't used it for any projects or tasks.
You definitely shouldn't trust it to teach you anything (I've seen some highly concerning errors in my tests), but it might be useful to you if you can verify the outputs.
Also check out the PrivacyGuides page on LLMs.