this post was submitted on 23 Dec 2025
119 points (100.0% liked)

Selfhosted

54413 readers
974 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

  7. No low-effort posts. This is subjective and will largely be determined by the community member reports.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago
MODERATORS
 

so i yes, espeak exists and still sounds terrible even worse than picoTTS (last update 4 yrs ago?). so what else is there? i look at mimic3 and it says they are dead and one should go for piper here: https://github.com/MycroftAI/mimic3 the link to piper followed I get: https://github.com/rhasspy/piper "This repository was archived by the owner on Oct 6, 2025. It is now read-only. "

ok, so coqui? https://github.com/coqui-ai/TTS no update in over 12 months..how bad can it be? https://coqui.ai/ ...great it is a page for gambling now.

so, what are you using? gTTS is not offline.

top 31 comments
sorted by: hot top controversial new old
[–] IcedRaktajino@startrek.website 22 points 2 weeks ago (2 children)

https://github.com/marytts/marytts

I've used MaryTTS semi-recently. It's older but works well enough for my cases. I have it running on a server (locally) and my endpoints make a call to it and playback the returned audio file.

On Android, I use SherpaTTS which has good voices, but I'm not aware of a desktop/Linux option. It mentions using voices from Coqui which you linked, so I would guess that would be the way to go for desktop.

[–] comrade_twisty@feddit.org 11 points 2 weeks ago (2 children)

SherpaTTS is great on GrapheneOS with OSM for navigation.

[–] warmaster@lemmy.world 3 points 2 weeks ago

I'm using it with CoMaps, freaking great.

[–] FrostyPolicy@suppo.fi 1 points 2 weeks ago

+1 for this.

[–] otter@lemmy.ca 4 points 2 weeks ago* (last edited 2 weeks ago)

Sherpa links to this page, if anyone wants to preview what the voices sound like

https://huggingface.co/spaces/k2-fsa/text-to-speech

From the ones I've tried so far, csukuangfj/vits-piper-en_US-amy-medium|1 speaker sounded the most clear and natural for GPS / driving directions. If someone finds other good ones, I'd appreciate it :)

[–] ikidd@lemmy.world 13 points 2 weeks ago (1 children)

Piper just moved to https://github.com/OHF-Voice/piper1-gpl

It's fine, and it's probably the best TTS you're going to run locally.

[–] KarlLimbo@lemmy.world 3 points 1 week ago (1 children)
[–] ikidd@lemmy.world 3 points 1 week ago* (last edited 1 week ago)

Happy to help.

The docker setup is probably less maintenance than a straight install, but your usecase might need the bare install. there's also Home Assistant that can add Piper as an Addon (which is a docker container inside the HassOS docker container). Also the Hass install will let you add faster-Whisper and Openwakeword for a full voice assistant that autoconnects via the Wyoming protocol.

[–] damnthefilibuster@lemmy.world 11 points 2 weeks ago (1 children)

Kokoro is your best bet right now. It works wonderfully even in a docker container with no GPU. There are others but I don’t have the list right now. Will throw another update on here when I do.

The rhasspy guy was very invested in Coqui. He built a lot of his own stuff, for his home automation and such. But Coqui was superior, so he started spending time on that.

Unfortunately, the coqui team (based out of Mozilla) was very distracted and didn’t ship a lot of stuff on time or at all. It doesn’t even have basic stuff like SSML support right now, if I recall correctly. So the rhasspy guy also lost steam.

Of course, with the OpenAI model of audio generation, you’re expected to not use SSML at all and just use the black box API to get “good enough” results. That really sucks.

Oh, I just remembered which other one I wanted to mention - someone has built an open source version of NotebookLLM, complete with multi voice support. But it requires GPU, I believe. Do what you will with that. I’ll add a link if I find it.

I prefer kokoro because it’s really solid and works really well on CPU.

[–] markko@lemmy.world 2 points 2 weeks ago (1 children)

I'm guessing this is what you're talking about?

[–] damnthefilibuster@lemmy.world 3 points 2 weeks ago

I dont think so. This one doesn’t have any code or implementation details. The one I saw was fully installable but a PITA.

Check the README for piper. It moved to https://github.com/OHF-Voice/piper1-gpl

[–] CubitOom 8 points 2 weeks ago (1 children)
[–] damnthefilibuster@lemmy.world 4 points 2 weeks ago (1 children)

What’s your workflow on your phone?

[–] CubitOom 4 points 2 weeks ago* (last edited 2 weeks ago) (1 children)

I have kaldi auto updated with my preferred voice when ever there is a new release via obtainium.

I set kaldi as my tts engine, and I disabled the google TTS.

Then anytime I use TTS on my phone it uses kaldi.

Its been really great for my preferred eBook reader (Librera) so I can do chores and read at the same time.

[–] damnthefilibuster@lemmy.world 3 points 2 weeks ago

That’s an excellent setup! I’ll try to replicate it when I get home!

[–] muzzle@lemmy.zip 8 points 2 weeks ago* (last edited 2 weeks ago)

for android: SherpaTTS and whisper for voice typing.

[–] handsoffmydata@lemmy.zip 7 points 2 weeks ago (2 children)

I highly recommend Mlx-audio for anyone doing tts on Apple Silicon. It offers great performance, leverages kokoro-82M, and plays well with streaming frontends like Open webui. The one shot voice cloning feature is also pretty cool.

[–] Rhaedas@fedia.io 5 points 2 weeks ago

Kokoro was the one I was going to mention. I played around with it a bit, was very impressed with the speed and quality. And then I realized I had been using it in CPU mode. GPU is incredible.

[–] damnthefilibuster@lemmy.world 2 points 2 weeks ago

That looks amazing! Will check it out. Love Kokoro and love how good Apple Silicon is!

[–] fhein@lemmy.world 6 points 2 weeks ago

https://github.com/resemble-ai/chatterbox is pretty good, and has both TTS and voice cloning. Main disadvantage for me was that even if the cloning gives a consistent voice, the generated samples can get random accents.

https://huggingface.co/zai-org/GLM-TTS also seemed pretty promising, but I haven't had time to test it yet.

[–] Konraddo@lemmy.world 5 points 2 weeks ago (1 children)

Try alltalk_tts v2. One of the features is you can provide an audio sample and the AI will imitate the voice. The overall quality is pretty good, if you choose a larger model and let it run.

[–] danielquinn@lemmy.ca 3 points 2 weeks ago

Ooh! Has anyone managed to do this with Majel Barrett's (the Enterprise computer) voice yet?

[–] RheumatoidArthritis@mander.xyz 4 points 2 weeks ago

Save that post for the next time when someone with too much time on their hands asks what project they should start/contribute to.

[–] early_riser@lemmy.world 4 points 2 weeks ago (1 children)

None of this may be relevant, but I'm curious what your use case is. I use TTS very extensively to consume media and have my preferences. None of them are open source, but as far as I know all operate locally, though they're baked into other programs like screen readers and ebook readers.

I prefer older more robotic voices because they remain intelligible at high speed. Eloquence is a favorite, as are the older Apple voices like Fred and Ralph. I think it has gone by other names but TruVoice (spacing and capitalization may vary) is also up there. It was semi popular during the surreal meme era. Another memetic synth that's a little before my time but I quite enjoy is DECTalk (AKA the Moonbase Alpha voice). I believe Vocalizer was responsible for the OG Siri voice Samantha and that one's a more human voice that's still serviceable at high speeds.

[–] KarlLimbo@lemmy.world 1 points 1 week ago (1 children)

my use case is that n8n sshs into a remote machine with low specs and a connected speaker to read out information sent from n8n so i can do stuff like: pico2wave -l de-DE -w /tmp/warn.wav "Es ist {{ $json.Hour }} Uhr." && aplay /tmp/warn.wav And as you might have guess by now german language would be appreciated. I'm not going to run any additional docker containers for voice generation or invoke remote services. also the speaker is as dirt cheap as the rest of the setup so any output from espeak was basically killing my eardrums.

[–] early_riser@lemmy.world 1 points 1 week ago (1 children)

I'm fairly confident espeak is all you're going to get that's FOSS, local, and with any non English support. Yes every espeak language sounds like a Brittish guy badly pronouncing that language, and this includes the American English voice as well.

[–] KarlLimbo@lemmy.world 3 points 1 week ago

@ikidd@lemmy.world pointed me to piperTTS and i ended up with a python virtual enviroment, pip install piper-tts and de_DE-thorsten-high.onnx+json to be able to run echo "{{ $json.state}}" | piper -m ./de_DE-thorsten-high.onnx -f voice.wav && aplay voice.wav

and indeed that sounds much better than pico and espeak

[–] eodur@piefed.social 3 points 2 weeks ago

It really depends on what you want to do with it. I run wyoming-piper as part of my Home Assistant deployment and its been rock solid. The Wyoming protocol is pretty well documented too, so you should be able to integrate with it pretty easily.

[–] NarrativeBear@lemmy.world 3 points 2 weeks ago* (last edited 2 weeks ago)

Faster Whisper could be a option, there is various GUI options available as well.

https://github.com/SYSTRAN/faster-whisper

And if you are looking for something that you can "just install", I recommend balabolka. The voices are natural and you can use some of the windows built in voices to make it more natural.

https://www.cross-plus-a.com/balabolka.htm

Make Azure natural TTS voices accessible to any SAPI 5-compatible application.

https://github.com/gexgd0419/NaturalVoiceSAPIAdapter

[–] just_another_person@lemmy.world -1 points 2 weeks ago

Not sure what you're asking here, but are you talking about the voice part, the TTS pat, or the interaction?