LocalLLaMA

3856 readers
1 users here now

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

Rules:

Rule 1 - No harassment or personal character attacks of community members. I.E no namecalling, no generalizing entire groups of people that make up our community, no baseless personal insults.

Rule 2 - No comparing artificial intelligence/machine learning models to cryptocurrency. I.E no comparing the usefulness of models to that of NFTs, no comparing the resource usage required to train a model is anything close to maintaining a blockchain/ mining for crypto, no implying its just a fad/bubble that will leave people with nothing of value when it burst.

Rule 3 - No comparing artificial intelligence/machine learning to simple text prediction algorithms. I.E statements such as "llms are basically just simple text predictions like what your phone keyboard autocorrect uses, and they're still using the same algorithms since <over 10 years ago>.

Rule 4 - No implying that models are devoid of purpose or potential for enriching peoples lives.

founded 2 years ago
MODERATORS
1
2
 
 

While thinking-aware generation aims to improve performance on complex tasks, we identify a critical failure mode where existing sequential, autoregressive approaches can paradoxically degrade performance due to error propagation. To systematically analyze this issue, we propose ParaBench, a new benchmark designed to evaluate both text and image output modalities. Our analysis using ParaBench reveals that this performance degradation is strongly correlated with poor alignment between the generated reasoning and the final image. To resolve this, we propose a parallel multimodal diffusion framework that enables continuous, bidirectional interaction between text and images throughout the entire denoising trajectory. This model, MMaDA-Parallel, is trained with supervised finetuning and then further optimized by Parallel Reinforcement Learning (ParaRL), a novel strategy that applies semantic rewards along the trajectory to enforce cross-modal consistency. Experiments validate that our approach significantly improves cross-modal alignment and semantic consistency, achieving a 6.9% improvement in Output Alignment on ParaBench compared to the state-of-the-art model, Bagel, establishing a more robust paradigm for thinking-aware image synthesis.

===

Could be a huge performance boost for image generation

3
 
 

ollama 0.12.11 released this week as the newest feature update to this easy-to-run method of deploying OpenAI GPT-OSS, DeepSeek-R1, Gemma 3, and other large language models. Exciting with ollama 0.12.11 is that it's now supporting the Vulkan API.

Launching ollama with the OLLAMA_VULKAN=1 environment variable set will now enable Vulkan API support as an alternative to the likes of AMD ROCm and NVIDIA CUDA acceleration. This is great for open-source Vulkan drivers, older AMD graphics cards lacking ROCm support, or even any AMD setup with the RADV driver present but not having installed ROCm. As we've seen when testing Llama.cpp with Vulkan, in some cases using Vulkan can be faster than using the likes of ROCm.

4
5
 
 

Introducing: Loki! An all-in-one, batteries-included LLM CLI tool

Loki started out as a fork of the fantastic AIChat CLI, where I just wanted to give it first-class MCP server support. It has since evolved into a massive passion project that's a fully-featured tool with its own identity and extensive capabilities! My goal is to make Loki a true "all-in-one" and "batteries-included" LLM tool.

Check out the release notes for a quick overview of everything Loki can do!

What Makes Loki Different From AIChat?

  • First-class MCP support, with support for both local and remote servers
    • Agents, roles, and sessions can all use different MCP servers and switching between them will shutdown any unnecessary ones and start the applicable ones
    • MCP sampling is coming next
  • Comes with a number of useful agents, functions, roles, and macros that are included out-of-the-box
  • Agents, MCP servers, and tools are all managed by Loki now; no need to pull another repository to create and use tools!
    • No need for any more *.txt files
  • Improved DevX when creating bash-based tools (agents or functions)
    • No need to have argc installed: Loki handles all the compilation for you!
    • Loki has a --build-tools flag that will build your bash tools so you can run them exactly the same way Loki would
    • Built-in Bash prompting utils to make your bash tools even more user-friendly and flexible
  • Built-in vault to securely store secrets so you don't have to store your client API keys in environment variables or plaintext anymore
    • Loki also will inject additional secrets into your agent's tools as environment variables so your agents can also use secrets securely
  • Multi-agent support out-of-the-box: You can now create agents that route requests to other agents and use multiple agents together without them trampling all over each other's binaries
  • Improved documentation for all the things!
  • Simplified directory structure so users can share full Loki directories and configurations without massive amounts of data, or secrets being exposed accidentally

What's Next?

  • MCP sampling support, so that MCP servers can send back queries for the LLM to respond to LLM requests. Essentially, think of it like letting the MCP server and LLM talk to each other to answer your query
  • Give Loki a TUI mode to allow it to operate like claude-code, gemini-cli, codex, and continue. The objective being that Loki can function exactly like all those other CLIs or even delegate to them when the problem demands it. No more needing to install a bunch of different CLIs to switch between!
  • Integrate with LSP-AI so you can use Loki from inside your IDEs! Let Loki perform function calls, utilize agents, roles, RAGs, and all other features of Loki to help you write code.
6
7
8
22
submitted 3 weeks ago* (last edited 2 weeks ago) by Xylight@lemdro.id to c/localllama@sh.itjust.works
 
 

Benchmarks look pretty good, even better than some of the text only models, make sure to take them with a grain of salt tho

Benchmarks

Qwen3 VL 30b a3b (No Thinking)

Visual benchmarks for Qwen3 VL 235 A22B (Thinking)

9
10
 
 

Hello everyone,

I'm trying to setup a local "vibe coding" environment and use it on some projects I abandoned years ago. So far I'm using KoboldCPP to run the models, VSStudio as editor with various extensions and models with limited to no luck.

Models are working properly (tested with KoboldCPP web UI and curl) but in VSCode they do not generate files. edit existing files etc. (it seems that the tool calling part is not working) and they usually go in a loop

I tried the following extensions:

  • RooCode
  • Cline

Models: (mostly from Unsloth)

  • Deepseek-coder:6.7
  • Phi4
  • Qwen3-Coder 30
  • Granite4 Small

Does anyone have any idea why it's not working or have a working setup that can share? (I'm open to change any of the tools/models)

I have 16GB of RAM and 8GB of VRAM as a reference

Many thanks in advance for your help!

11
12
13
14
 
 

Or something that goes against the general opinions of the community? Vibes are the only benchmark that counts after all.

I tend to agree with the flow on most things but my thoughts that I'd consider going against the grain:

  • QwQ was think-slop and was never that good
  • Qwen3-32B is still SOTA for 32GB and under. I cannot get anything to reliably beat it despite shiny benchmarks
  • Deepseek is still open-weight SotA. I've really tried Kimi, GLM, and Qwen3's larger variants but asking Deepseek still feels like asking the adult in the room. Caveat is GLM codes better
  • (proprietary bonus): Grok 4 handles news data better than GPT-5 or Gemini 2.5 and will always win if you ask it about something that happened that day.
15
 
 
16
17
18
19
 
 

The Apple M5 Pro chip is rumored to be announced later this week with improved GPU cores and faster inference. With consumer hardware getting better and better and AI labs squishing down models to fit in tiny amounts of vRAM, it's becoming increasingly more feasible to have an assistant which has absorbed the entirety of the internet's knowledge built right into your PC or laptop, all running privately and securely offline. The future is exciting everyone, we are closing the gap

20
 
 

It is small but still really good

21
22
 
 

I found this project which is just a couple of small python scripts glueing various tools together: https://github.com/vndee/local-talking-llm

It's pretty basic, but couldn't find anything more polished. I did a little "vibe coding" to use a faster Chatterbox fork, stream the output back so I don't have to wait for the entire LLM to finish before it starts "talking," start recording on voice detection instead of the enter key, and allow interruption of the agent. But, like most vibe-coded stuff, it's buggy. Was curious if there was something better that already exists before I commit to actually fixing the problems and pushing a fork.

23
24
 
 

Hey All 👋

I have been running open webui and ollama on one of my servers for the last year or so. Recently I added a new server, spun up ollama on the more performant machine and started using GPT-OSS:20B

I built a few tools including a tool that scrapes the Wikipedia library on my Kiwix server. I configured it to scrape images and text that can be summarized in the chat as shown in the image I included.

I’m very impressed with this models ability to call tools effectively as well as the model’s context window. I had to iterate on the Kiwix tool a few times when the reply overloaded the window and led to a lot of hallucinations after it got the response but I’m really happy with how well it works now.

I also ran into a really interesting use case where I was showing a user how to use tools and he asked the model “what treatments are available for dogs with yeast infections” the model thought for a second and returned nothing. The thought said, “user is requesting medical advice. Do not answer the request.” I asked the user to send a message “use the Kiwix tool to look up yeast infections. Summarize any treatments suitable for dogs” and it used the tool and provided a wall of text.

So far I have tools to query my plex library, an azure poi search, summarizing nextcloud forms api data, weather, and the Kiwix search. I really feel like my local model has almost as much value as the big boys thanks to these tools.

Anyone else messing around with tools?

25
 
 

Mine attempts to lie whenever it can if it doesn't know something. I will call it out and say that is a lie and it will say "you are absolutely correct" tf.

I was reading into sleeper agents placed inside local LLMs and this is increasing the chance I'll delete it forever. Which is a shame because it is the new search engine seeing how they ruined search engines

view more: next ›