overview for pepperfree

DeepSeek dropped the V3.1 Weight in c/localllama@sh.itjust.works

[–] pepperfree@sh.itjust.works 1 points 1 month ago (1 children)

Ah. Sorry, good thing I attached the related link.

DeepSeek dropped the V3.1 Weight in c/localllama@sh.itjust.works

[–] pepperfree@sh.itjust.works 2 points 1 month ago

Everybody been rumoring about R2. So releasing this thing kinda unexpected

39

DeepSeek dropped the V3.1 Weight (huggingface.co)

submitted 1 month ago by pepperfree@sh.itjust.works to c/localllama@sh.itjust.works

15 comments fedilink

Not what we expected...

Fine tuned models for summarisation? in c/localllama@sh.itjust.works

[–] pepperfree@sh.itjust.works 1 points 2 months ago

So something like

Previously the text talk about [last summary]
[The instruction prompt]...
[Current chunk/paragraphs]

When DeepSeek V4 and R2? in c/localllama@sh.itjust.works

[–] pepperfree@sh.itjust.works 1 points 2 months ago (1 children)

The RL is so good grok changed it's personality by changing small part of it's system prompt

When DeepSeek V4 and R2? in c/localllama@sh.itjust.works

[–] pepperfree@sh.itjust.works 0 points 2 months ago

Llama 3.3 was good, tho. For the multimodal, llama 4 also use llama3.2 approach where the image and text is made into single model instead using CLIP or siglip.

When DeepSeek V4 and R2? in c/localllama@sh.itjust.works

[–] pepperfree@sh.itjust.works 3 points 2 months ago

When DeepSeek V4 and R2? in c/localllama@sh.itjust.works

[–] pepperfree@sh.itjust.works 3 points 2 months ago (2 children)

They got the whole Twitter database. It's kinda the same with Gemini. But somehow Meta isn't catching up, maybe their llama 4 architecture isn't that stable to train.

When DeepSeek V4 and R2? in c/localllama@sh.itjust.works

[–] pepperfree@sh.itjust.works 1 points 2 months ago

There is new project which they share fine-tuned modernbert on some task. Here is the org https://huggingface.co/adaptive-classifier

When DeepSeek V4 and R2? in c/localllama@sh.itjust.works

[–] pepperfree@sh.itjust.works 1 points 2 months ago (4 children)

It changed after Grok 3

175

When DeepSeek V4 and R2? (infosec.pub)

submitted 2 months ago by pepperfree@sh.itjust.works to c/localllama@sh.itjust.works

24 comments fedilink

So image generation is where it's at? in c/localllama@sh.itjust.works

[–] pepperfree@sh.itjust.works 3 points 2 months ago

Lots of developer choose to write in CUDA as ROCm support back then is a mess.

So image generation is where it's at? in c/localllama@sh.itjust.works

[–] pepperfree@sh.itjust.works 4 points 2 months ago* (last edited 2 months ago) (1 children)

No, you can run sd, flux based model inside the koboldcpp. You can try it out using the original koboldcpp in google colab. It loads gguf model. Related discussion on Reddit: https://www.reddit.com/r/StableDiffusion/comments/1gsdygl/koboldcpp_now_supports_generating_images_locally/

Edit: Sorry, I kinda missed the point, maybe I'm sleepy when writing that comment. Yeah, I agree that LLM need big memory to run which is one of it's downside. I remember someone doing comparison that API with token based pricing is cheaper that to run it locally. But, running image generation locally is cheaper than API with step+megapixel pricing.

MindLink-32B and MindLink-72B available on Huggingface in c/localllama@sh.itjust.works

[–] pepperfree@sh.itjust.works 1 points 2 months ago

Skywork downfall

17

MindLink-32B and MindLink-72B available on Huggingface (sh.itjust.works)

submitted 2 months ago by pepperfree@sh.itjust.works to c/localllama@sh.itjust.works

2 comments fedilink

Built on Qwen, these models incorporate our latest advances in post-training techniques. MindLink demonstrates strong performance across various common benchmarks and is widely applicable in diverse AI scenarios.

72B 32B