this post was submitted on 24 Jun 2025
10 points (72.7% liked)

Ollama - Local LLMs for everyone!

194 readers
1 users here now

A place to discuss Ollama, from basic use, extensions and addons, integrations, and using it in custom code to create agents.

founded 1 week ago
MODERATORS
 

Do you use it to help with schoolwork / work? Maybe to help you code projects, or to help teach you how to do something?

What are your preferred models and why?

all 24 comments
sorted by: hot top controversial new old
[–] LemmiChanga@programming.dev 8 points 1 week ago* (last edited 5 days ago) (2 children)

As a voice assistant server for my home assistant setup.

Edit. I leaned it by watching a Networkchuck video on YouTube. Basically a Piper integration for “voice to text” then a Whisper intergration for “text to voice”, and openWakeWord integration for wake word.

[–] schlump@feddit.org 3 points 1 week ago

That sounds interesting! Can you describe what software you used for that? And how powerful does the hardware has to be?

[–] wise_pancake@lemmy.ca 1 points 1 week ago

Can you describe your setup?

I would love an offline voice assist while coding, but I’m paranoid about sharing my voice with AI providers given how easy voices are to clone these days.

[–] chaospatterns@lemmy.world 4 points 1 week ago

I've been experimenting with it for different use cases:

  • Standard chat style interface with open-webui. I use it to ask things that people would normally ask ChatGPT. Researching things, vacation plans, etc. I take it all with a grain of salt and also still use search engines
  • Parts of different software projects I have using ollama-python. For example, I tried using it to auto summarize transaction data
  • Home Assistant voice assistants for my own voice activated smart home
  • Trying out code completion using TabbyML

I only have a GeForce 1080 Ti in it, so some projects are a bit slow and I don't have the biggest models, but what really matters is the self-satisfaction I get by not using somebody else's model, or that's what I try to tell myself while I'm waiting for responses.

[–] calmluck9349 4 points 1 week ago

I employ this technique to embellish my email communications, thereby enhancing their perceived authenticity and relatability. Admittedly, I am not particularly adept at articulating my thoughts in comprehensive, well-structured sentences. I tend to favor a more primal, straightforward cognitive style—what one might colloquially refer to as a "meat-and-potatoes" or "caveman" approach to thinking. Ha.

[–] wise_pancake@lemmy.ca 2 points 1 week ago

I’m in the process of trying them out again

Phi4 has been okay for me, and I use deepseek R1 32B quantized for some coding tasks. Both are a lot for my aging m1 MacBook Pro to handle.

Lately Ive been trying deepseek 8b for document summaries and it’s pretty fast but janky.

What I’m working towards is setting up an RSS app and feeding that into a local model (freshRSS I think lets you subscribe to a combined feed) to build a newspaper of my news subscriptions, but that’s not viable until I get a computer to run as a server.

Communication in general. I've found I don't communicate well, or rather what I type is taken in ways I would never have considered it to be taken - and AI really helps with that. If I'm sending a high-visibility message I'll usually run it through a local LLM first to make sure there aren't things I didn't intend.

For example, one I didn't put through I was saying something like "Yes I'm available to meet up, let's meet now because I have a commitment in 20 minutes". It was taken as "He has other commitments that are apparently a higher priority" and I was called out for saying it. Using AI usually catches those nuances I don't normally get.

[–] danhab99@programming.dev 1 points 1 week ago

Analyze jira tickets

Analyze meeting transcripts from every video call I've ever been on

Basically as a second brain strictly used for retrieving knowledge. I will never use it for reasoning.

[–] vhstape@lemmy.sdf.org 1 points 1 week ago (2 children)

I haven't been able to find a model that is both performant and useful on my machines (RTX 3060 12GB and M4 Mac mini), but I am open to suggestions! I know I want to use local LLMs more, but I feel that their utility is limited on consumer hardware

[–] borari@lemmy.dbzer0.com 2 points 1 week ago (1 children)

The Mac Mini should support a slew of models because of the unified memory right? I’m using the Gemma3 12b model while locally developing my work project now on a laptop with a 4090M. The laptop/4090M kind of sucks tbh, employer definitely wasted their money but it wasn’t up to me.

How much ram on the mini? Gemma3 27b is like 17GB, so that should all fit in the unified memory. The 12b version is only like 8GB so I’d think that would work on your 3060.

You could probably also find some much more slimmed down models that focus on a specific thing you care about on hugging face. You don’t need a model trained on all of Shakespeare’s works if you want your local I’ll to explain code you’re reviewing.

[–] vhstape@lemmy.sdf.org 2 points 1 week ago

My Mac mini (32GB) can run 12B parameter models at around 13 tokens/sec, and my 3060 can achieve roughly double. However, both machines have a hard time keeping up with larger models. I'll have to look into some special-purpose models

[–] Evilschnuff@feddit.org 1 points 1 week ago

Did you check out Gemma 3 variants? They were quite good in my opinion.

[–] borari@lemmy.dbzer0.com 1 points 1 week ago (1 children)

I’m currently using it to generate initial contact emails, and generate contextual responses to received replies, for a phishing project at work.

[–] brendansimms@lemmy.world 3 points 1 week ago (2 children)

in order to prevent phishing, right? (cue anakin/padme meme)

[–] borari@lemmy.dbzer0.com 2 points 3 days ago (1 children)

lol. Uhhhhhhh not so much lol.

I work on an internal red team, so covert in prod operations instead of limited scope one off pen tests. We actively phish employees, but any victim user isn’t named in the report and we provide follow up training with them that’s not shame based and it’s with the operators directly, not some mandatory online class annual training bullshit.

It sucks, but this is a huge vector of initial compromise for APTs, and I work in an industry and for a company that are both extremely frequently targeted by APTs, so we have to do what we do. It lets us identify gaps in security and signature known ttps so our defensive teams can id those alerts when they pop.

[–] brendansimms@lemmy.world 1 points 3 days ago (1 children)

i like the addition of 'not shame based' haha

[–] borari@lemmy.dbzer0.com 2 points 3 days ago

Bro it’s a huge problem, companies will blame the employee that opened the malicious document instead of blaming their fucking abysmal internal security controls, detection, and response.

[–] catty@lemmy.world 2 points 1 week ago (1 children)
[–] borari@lemmy.dbzer0.com 1 points 3 days ago

It is at work i promise lol.

[–] seathru@lemmy.sdf.org 1 points 1 week ago (1 children)

I currently don't. But I am ollama-curious. I would like to feed it a bunch of technical manuals and then be able to ask it to recite specs or procedures (with optional links to it's source info for sanity checking). Is this where I need to be looking/learning?

[–] brendansimms@lemmy.world 2 points 1 week ago (2 children)

you might want to look into RAG and 'long-term memory' concepts. I've been playing around with creating a self-hosted LLM that has long-term memory (using pre-trained models), which is essentially the same thing as you're describing. Also - GPU matters. I'm using an RTX 4070 and it's noticeably slower than something like in-browser chatgpt, but I know 4070 is kinda pricey so many home users might have earlier/slower gpu's.

[–] Styxia@lemmy.world 1 points 1 week ago (1 children)

How have you been making those models? I have a 4070 and doing it locally has been a dependency hellscape, I’ve been tempted to rent cloud GPU time just to save the hassle.

[–] brendansimms@lemmy.world 2 points 1 week ago

I'm downloading pre-trained models. I had a bunch of dependency issues getting text-generation-webui to work and honestly I probably installed some useless crap in the process, but I did get it to work. LM Studio is much simpler, but less customization(or I just don't know how to use it all in lm studio). But yea, I'm just downloading pre-trained models and running them in these UI's (right now I just loaded up 'deepseek-r1-distill-qwen-7b' in LM Studio). I also have the nvidia app installed and I make sure my gpu drivers are always up to date.