LocalLLaMA

3822 readers

10 users here now

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

Rules:

Rule 1 - No harassment or personal character attacks of community members. I.E no namecalling, no generalizing entire groups of people that make up our community, no baseless personal insults.

Rule 2 - No comparing artificial intelligence/machine learning models to cryptocurrency. I.E no comparing the usefulness of models to that of NFTs, no comparing the resource usage required to train a model is anything close to maintaining a blockchain/ mining for crypto, no implying its just a fad/bubble that will leave people with nothing of value when it burst.

Rule 3 - No comparing artificial intelligence/machine learning to simple text prediction algorithms. I.E statements such as "llms are basically just simple text predictions like what your phone keyboard autocorrect uses, and they're still using the same algorithms since <over 10 years ago>.

Rule 4 - No implying that models are devoid of purpose or potential for enriching peoples lives.

founded 2 years ago

MODERATORS

pax@sh.itjust.works

noneabove1182@sh.itjust.works

Smokeydope@lemmy.world

MonsterBug@sh.itjust.works

getting started in hard mode with very wimpy hardware (lemmy.sdf.org)

submitted 1 month ago by ThorrJo@lemmy.sdf.org to c/localllama@sh.itjust.works

4 comments fedilink hide all child comments

Hi all, I have never touched any tools for local inference and barely know anything about the landscape. Additionally, the only hardware I have available is a 8C/16T Zen 3 CPU and 48GB of RAM. I have many years experience running Linux as a daily driver and small network sysadmin.

I am well aware this is extreme challenge mode, but it's what I have to work with for now, and my main goal is more to do with learning the ecosystem than with getting highly usable results.

I decided for various reasons that my first project would be to get a model which I can feed an image, and have it output a caption.

If I have to quantize a model to make it fit into my available RAM then I am willing to learn that too.

I am looking for basic pointers of where to get started, such as "read this guide," "watch this video," "look into this software package."

I am not looking for solutions which involve using an API where inference happens on a machine which is not my own.

top 4 comments

sorted by: hot top controversial new old

[–] Smokeydope@lemmy.world 8 points 1 month ago* (last edited 1 month ago) (3 children)

Hey there ThorrJo welcome to our community.

I recommend you use kobold.cpp as your first inference engine of choice as its very easy to get running especially on Linux. Since you have no GPU you don't need to worry about CUDA or Vulcan for offloading.

https://github.com/LostRuins/koboldcpp/

Read the kobold wiki section for vision model projection. For the image recognition model itself I recommend you use Nvidia Cosmos finetune of Qwen2.5-VL. Make sure to load the qwen2.5vl mmproj lens that kobold links along with the model.

https://github.com/LostRuins/koboldcpp/wiki#what-is-llava-and-mmproj

https://huggingface.co/koboldcpp/mmproj/tree/main

https://huggingface.co/mradermacher/Cosmos-Reason1-7B-i1-GGUF

.GGUF I linked are already pre-quantized, you should be able to load the biggest quant available and the f16 mmproj on your 48gb ram easy with lots of context allocation room left.

Allocate as much context size as you can. larger high resolution images take more input context to process.

For troubleshooting if its replies are wonky try changing chat template first I forget if its ChatML or something else. You can try adjusting sampler size too.

Kobold.CPP runs a web interface you can connect to through the browser on multiple devices. It also exposes its backend through openai-compatable api so you can write your own custom apps for send and receive or use kobold with other frontend software thats compatable with corporate APIs like tinychat if you want to go further.

If you have any specific questions or need help feel free to reach out :)

[–] afk_strats@lemmy.world 4 points 1 month ago

Chiming in to say this is a very reasonable starting place and wanted to highlight to op that this solution is 100% self-hosted

[–] Tobberone@slrpnk.net 3 points 1 month ago

I'm a beginner myself, and while I do have a GPU (unsure how much that speeds up things) I have found the qwen3-coder has been almost a cheatcode when problem solving the various issues that otherwise would have me search different forums for hours.

[–] ThorrJo@lemmy.sdf.org 2 points 1 month ago

Thank you so much for this detailed starting point!! ❤️