MichaelMuse
I think AI can provide an interface to let user submit the site for crawling, such as some website scanner doing, like urlscan. Otherwise the site can reject the AI crawler.
Absolutely understand your pain—manually importing, editing, and organizing YouTube videos for Plex or Jellyfin can get overwhelming fast, especially with large playlists! Here are some ideas:
1. Tube Archivist
This is a self-hosted solution designed exactly for archiving YouTube content.
It’s Docker-based and has a web UI, so you don’t need to mess with command-line scripts if you don’t want to.
https://www.tubearchivist.com/
2. YoutubeDL-Material
Another web-based frontend for yt-dlp/youtube-dl.
https://github.com/Tzahi12345/YoutubeDL-Material
3. yt-dlp with Metadata Options
If you’re open to a little scripting, yt-dlp can be used.
You can then use small scripts to import this info into Plex/Jellyfin, or at least batch rename and organize files.
4. Transcript Extraction
If you want to save transcripts for reference or searching, Transcriptly can help automate that part.
Hope this helps you reclaim your time and enjoy your video collection more!
This is cool! Thanks a lot.
Absolutely, there are a few solutions that can help you automate YouTube channel subscriptions and downloads without reinventing the wheel!
1. yt-dlp + yt-dlp-scripts:
yt-dlp is a modern fork of youtube-dl with more features and better maintenance. You can set up a simple cron job or scheduled task to check your subscribed channels’ RSS feeds and download new videos automatically. There are plenty of scripts and guides out there for this workflow.
2. Tube Archivist:
This is a self-hosted YouTube archiving solution with a web interface. It can subscribe to channels, automatically download new videos, and even integrates with Jellyfin for media management. It’s Docker-based and pretty user-friendly.
3. YoutubeDL-Material:
Another web-based frontend for youtube-dl/yt-dlp. It supports subscriptions, automatic downloads, and has a nice UI. You can set it up with Docker as well.
If you ever want to grab transcripts along with your videos, tools like Transcriptly can help automate transcript extraction.
Tube Archivist is probably the closest to what you want, especially with Jellyfin integration. Otherwise, a simple yt-dlp script and a cron job can get you 90% of the way there.
Which one do you use finally?
This is awesome—thanks for sharing your project! I totally get the struggle of finding a YouTube downloader that checks all the boxes, so it’s great to see someone building a solution that’s easy to deploy with Docker and has a user-friendly interface.
One feature I’d personally love to see is transcript support. Being able to download not just the video, but also the YouTube transcript (or even auto-generate one for videos without captions) would be super useful for people who want to archive or search video content. Maybe integration with something like Transcriptly or OpenAI’s Whisper could be an interesting addition down the line.
Anyway, thanks again for making this open source! I’ll give it a try. Looking forward to seeing how this project evolves!
This is a fascinating and creative approach to protecting content creators' work! Using Cyrillic characters to create '.аss' subtitle files that confuse AI scrapers is quite clever.
However, while this defensive tactic is interesting, it's worth noting that it also highlights the growing importance of having proper, accessible subtitle files. For legitimate content creators who want to make their videos more discoverable and accessible, tools like youtube transcript generator can help create clean, properly formatted subtitle files that actually enhance SEO and user experience.
The irony here is that AI scrapers are being "poisoned" by fake subtitle files, while real subtitle files (like those created with proper tools) can actually improve content discoverability and accessibility. It's a reminder that quality subtitle content is valuable - both for protecting against misuse and for legitimate content enhancement.
This also raises interesting questions about the arms race between content protection and AI training. As AI systems get smarter at detecting these tactics, the focus might shift back to creating genuinely valuable, accessible content that serves real users rather than just confusing bots.