LocalLLaMA

3817 readers
25 users here now

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

Rules:

Rule 1 - No harassment or personal character attacks of community members. I.E no namecalling, no generalizing entire groups of people that make up our community, no baseless personal insults.

Rule 2 - No comparing artificial intelligence/machine learning models to cryptocurrency. I.E no comparing the usefulness of models to that of NFTs, no comparing the resource usage required to train a model is anything close to maintaining a blockchain/ mining for crypto, no implying its just a fad/bubble that will leave people with nothing of value when it burst.

Rule 3 - No comparing artificial intelligence/machine learning to simple text prediction algorithms. I.E statements such as "llms are basically just simple text predictions like what your phone keyboard autocorrect uses, and they're still using the same algorithms since <over 10 years ago>.

Rule 4 - No implying that models are devoid of purpose or potential for enriching peoples lives.

founded 2 years ago
MODERATORS
1
19
submitted 23 hours ago* (last edited 23 hours ago) by Xylight@lemdro.id to c/localllama@sh.itjust.works
 
 

Benchmarks look pretty good, even better than some of the text only models, make sure to take them with a grain of salt tho

Benchmarks

Qwen3 VL 30b a3b (No Thinking)

Visual benchmarks for Qwen3 VL 235 A22B (Thinking)

2
3
 
 

Hello everyone,

I'm trying to setup a local "vibe coding" environment and use it on some projects I abandoned years ago. So far I'm using KoboldCPP to run the models, VSStudio as editor with various extensions and models with limited to no luck.

Models are working properly (tested with KoboldCPP web UI and curl) but in VSCode they do not generate files. edit existing files etc. (it seems that the tool calling part is not working) and they usually go in a loop

I tried the following extensions:

  • RooCode
  • Cline

Models: (mostly from Unsloth)

  • Deepseek-coder:6.7
  • Phi4
  • Qwen3-Coder 30
  • Granite4 Small

Does anyone have any idea why it's not working or have a working setup that can share? (I'm open to change any of the tools/models)

I have 16GB of RAM and 8GB of VRAM as a reference

Many thanks in advance for your help!

4
5
6
7
 
 

Or something that goes against the general opinions of the community? Vibes are the only benchmark that counts after all.

I tend to agree with the flow on most things but my thoughts that I'd consider going against the grain:

  • QwQ was think-slop and was never that good
  • Qwen3-32B is still SOTA for 32GB and under. I cannot get anything to reliably beat it despite shiny benchmarks
  • Deepseek is still open-weight SotA. I've really tried Kimi, GLM, and Qwen3's larger variants but asking Deepseek still feels like asking the adult in the room. Caveat is GLM codes better
  • (proprietary bonus): Grok 4 handles news data better than GPT-5 or Gemini 2.5 and will always win if you ask it about something that happened that day.
8
 
 
9
10
11
12
 
 

The Apple M5 Pro chip is rumored to be announced later this week with improved GPU cores and faster inference. With consumer hardware getting better and better and AI labs squishing down models to fit in tiny amounts of vRAM, it's becoming increasingly more feasible to have an assistant which has absorbed the entirety of the internet's knowledge built right into your PC or laptop, all running privately and securely offline. The future is exciting everyone, we are closing the gap

13
 
 

It is small but still really good

14
15
 
 

I found this project which is just a couple of small python scripts glueing various tools together: https://github.com/vndee/local-talking-llm

It's pretty basic, but couldn't find anything more polished. I did a little "vibe coding" to use a faster Chatterbox fork, stream the output back so I don't have to wait for the entire LLM to finish before it starts "talking," start recording on voice detection instead of the enter key, and allow interruption of the agent. But, like most vibe-coded stuff, it's buggy. Was curious if there was something better that already exists before I commit to actually fixing the problems and pushing a fork.

16
17
 
 

Hey All 👋

I have been running open webui and ollama on one of my servers for the last year or so. Recently I added a new server, spun up ollama on the more performant machine and started using GPT-OSS:20B

I built a few tools including a tool that scrapes the Wikipedia library on my Kiwix server. I configured it to scrape images and text that can be summarized in the chat as shown in the image I included.

I’m very impressed with this models ability to call tools effectively as well as the model’s context window. I had to iterate on the Kiwix tool a few times when the reply overloaded the window and led to a lot of hallucinations after it got the response but I’m really happy with how well it works now.

I also ran into a really interesting use case where I was showing a user how to use tools and he asked the model “what treatments are available for dogs with yeast infections” the model thought for a second and returned nothing. The thought said, “user is requesting medical advice. Do not answer the request.” I asked the user to send a message “use the Kiwix tool to look up yeast infections. Summarize any treatments suitable for dogs” and it used the tool and provided a wall of text.

So far I have tools to query my plex library, an azure poi search, summarizing nextcloud forms api data, weather, and the Kiwix search. I really feel like my local model has almost as much value as the big boys thanks to these tools.

Anyone else messing around with tools?

18
 
 

Mine attempts to lie whenever it can if it doesn't know something. I will call it out and say that is a lie and it will say "you are absolutely correct" tf.

I was reading into sleeper agents placed inside local LLMs and this is increasing the chance I'll delete it forever. Which is a shame because it is the new search engine seeing how they ruined search engines

19
 
 

Qwen image edit seems to be the best one available right now, but I can't run it. It would take too long on my system, especially at 40 steps.

For general image generation, Flux schnell works great and can generate on my system in about 60 seconds. I was hoping to see Flux kontext-schnell, but it doesn't exist, only kontext-dev.

20
 
 

Hi all, I have never touched any tools for local inference and barely know anything about the landscape. Additionally, the only hardware I have available is a 8C/16T Zen 3 CPU and 48GB of RAM. I have many years experience running Linux as a daily driver and small network sysadmin.

I am well aware this is extreme challenge mode, but it's what I have to work with for now, and my main goal is more to do with learning the ecosystem than with getting highly usable results.

I decided for various reasons that my first project would be to get a model which I can feed an image, and have it output a caption.

If I have to quantize a model to make it fit into my available RAM then I am willing to learn that too.

I am looking for basic pointers of where to get started, such as "read this guide," "watch this video," "look into this software package."

I am not looking for solutions which involve using an API where inference happens on a machine which is not my own.

21
 
 

It uses shadcn-svelte for nice looking components, and it should be faster than the old React-based UI.

22
23
24
 
 

For coding AI, it could make sense to specialize models on architecture, functional/array split from loopy solutions, or just asking 4 separate small models, and then using a judge model to pick the best parts of each.

25
view more: next ›