It's A Digital Disease!

22 readers
1 users here now

This is a sub that aims at bringing data hoarders together to share their passion with like minded people.

founded 2 years ago
MODERATORS
26
 
 
The original post: /r/datahoarder by /u/True_Pirate on 2025-06-06 23:08:20.

I have ~20tb of data currently and it is growing. I don’t trust or use cloud storage and am curious what you guys think about it. Here is what I do. I have my primary data hard drives connected to my pc. I keep a full offsite backup at a relatives house scattered across some older drives.

As a redundancy on really important data I have about 250 blank blu rays burned with irreplaceable/harder to replace stuff. These are not M discs and may not be in great shape in a decade but in a worst case scenario they make me feel better.

To keep it all straight I have a bunch of excel spreadsheets that I can reference to see what is stored where. What do you guys think?

27
 
 
The original post: /r/datahoarder by /u/Friendly_Guard694 on 2025-06-06 22:21:13.

I keep buying these because they're cheap and easily portable with a laptop. I'm very minimalist, my life fits in a suitcase. Is there something bigger and better but also portable?

28
 
 
The original post: /r/datahoarder by /u/ryerhino on 2025-06-06 20:59:08.

Got tired of duplicate versions eating up space, so I made PlexDeDupe.

It scans your library, shows all duplicates, lets you pick which to keep (largest for quality or smallest for space), then removes the extras. Files go to Recycle Bin, not permanently deleted. Freed up .5TB on my first run.

Free & open source: https://github.com/SabrosoCuy/PlexDeDupe

Requirements:Python 3.6 or higher, Plex Media Server, Plex authentication token (Instructions provided in GUI), PlexAPI Python library (pip install plexapi).

I have not tried this with remote drives as mine are all local but it should work.

PS: I used Claude Opus 4 to help write this.

https://preview.redd.it/r0bpxqb1cd5f1.png?width=890&format=png&auto=webp&s=e9790c9616684b4a6053042654b95d8d66b87c18

29
 
 
The original post: /r/datahoarder by /u/mikepm07 on 2025-06-06 20:33:39.

Hey, I started a new job recently that has nearly 600TB of video footage, with about 80% of it sitting on hard drives that are over 10 years old and that isn't kept in an alternate location.

It sounds like some of these drives haven't been turned on and verified in three years.

My new boss just requested we come up with some proposals on how we could safely update our storage and protect from hard drive failure.

We have a DAM (Digital Asset Management Tool) that keeps a lot of the footage we need regularly accessible, but I know he won't want to delete any of the 600TB of footage.

What's our best option here?

My thought is just to buy new hard drives and make it a policy to verify each drive once a year. In addition to that, we need to clone the contents of each drive to a backup and keep it at a separate location as a safety precaution.

I think that will be cheaper than a server or NAS type system?

Would love any thoughts from people who operate in this field more than I.

Thank you

30
 
 
The original post: /r/datahoarder by /u/JamesRitchey on 2025-06-06 20:16:48.

This tutorial is for comparing the contents of 2 folders to confirm they contain the same files, when the filenames, or folder structure are different. This is accomplished by hashing the contents.

Steps:

  • Download Ritchey Hash Directory i2 v2. It's an opensource PHP function I made for hashing directories by treating all the files as part of the input to be hashed.
git clone https://github.com/jamesdanielmarrsritchey/ritchey_hash_directory_i2.git

  • Make a PHP script which uses this function to hash both directories' files, and compare the checksums. To do this, paste the following into "ritchey_hash_directory_i2/custom_script.php" (the file doesn't exist, so you'll need to create it).
<?php
$location = realpath(dirname(__FILE__));

$dir1 = "{$location}/temporary/Example 1"; // Change this!
$dir2 = "{$location}/temporary/Example 1"; // Change this!
$algo = 'sha3-256'; // Optionally, change this. Only select algorithms are supported by the hashing function. For most users 'sha3-256' or 'sha256' should be fine.

require_once $location . '/ritchey_hash_directory_i2_v2.php';
$checksum1 = ritchey_hash_directory_i2_v2($dir1, $algo, FALSE, NULL, TRUE);
$checksum2 = ritchey_hash_directory_i2_v2($dir2, $algo, FALSE, NULL, TRUE);
if (is_string($checksum1) === TRUE && is_string($checksum2) === TRUE){
if ($checksum1 === $checksum2){
echo "Checksums match." . PHP_EOL;
} else {
echo "Checksums differ." . PHP_EOL;
}
} else {
echo "ERROR" . PHP_EOL;
}
?>

(You might need to clean-up the formatting if it doesn't paste nicely)

  • Edit the custom PHP script to have your values for the directories to hash, and the algorithm to use. To do this, change the values of $dir1, $dir2, and $algo.

  • Make any other desired changes (if any) to your script. For example, maybe you want it to display the checksums?

  • Run the script.

cd ritchey_hash_directory_i2 && php custom_script.php && cd -

  • Examine the result. You should get a return that is either "Checksums match." or "Checksums differ.".

Note:

  • The hashing function relies on checksums to decide the order of files for the input when hashing. The order of files for the input impacts the checksum produced. This means collisions between checksums could cause incorrect results, by disrupting the order of the input, so it's advisable to use a strong hashing algorithm, to avoid collisions.

--

There's obviously other ways to do this sort of thing, so please share other programs, scripts you've made, etc. Help save the next person some work :)

EDIT: fixed post formatting

31
 
 
The original post: /r/datahoarder by /u/D3VEstator on 2025-06-06 19:07:28.

I have bunch of dvds and im debating on if i should rip them because of quality?

The bluerays i rip, but im not sure about dvds in today day in age?

Thoughts

[EDITED]: Thanks for everyone who commented, i will continue to look at these. I will continue my ripping process of tv shows and movies that i know i will watch many times over

32
 
 
The original post: /r/datahoarder by /u/Arcueid-no-Mikoto on 2025-06-06 18:57:53.

Yesterday I tried to download all their manga DB by simply using HHTrack with "https://www.mangaupdates.com/series" URL, as all manga are within. Before I went to bed it had scanned 70k+ links and the folder was 9GB size. It had a ton of the manga pages downloaded, but when I woke up, it said "Task Finished" and most files were deleted from the folder, and now it's 2GB size with most folders empty.

Any idea why would it delete what it downloaded?

Also, I'm new with HHTrack or downloading sites at all. Any reliable way to download their full Manga DB? I'd love if I could use their advanced search offline.

Conveniently, both all the manga and advanced search are behind the /series, so downloading this url succefully should make it work right?

This is the advanced search URL:

https://www.mangaupdates.com/series/advanced-search

And this any random manga:

https://www.mangaupdates.com/series/ygablqw/tsugumomo

How would you go about this? Should I keep using HHtrack or is there a more suitable program? I'd love to know if there's any configuration option I'm missing and should add for this task.

Thanks!

33
 
 
The original post: /r/datahoarder by /u/TheRealFutaFutaTrump on 2025-06-06 18:17:50.

I have a couple Sony Hi8 tapes (no camcorder) and some mini VHS. I know they make those boxes you can plug into a VCR for the VHS tapes, then you plug into a capture device and pray it works.

Is there a solution that does all of it? Both types of tape plus the capture? I have a pretty awesome computer, but nothing but the tapes (no VCR, no camcorders at all.) I could probably hire a service for as much to do it but I would prefer to screen the tapes myself.

34
 
 
The original post: /r/datahoarder by /u/sprfreek on 2025-06-06 18:00:00.
35
 
 
The original post: /r/datahoarder by /u/nando1969 on 2025-06-06 17:49:10.
36
 
 
The original post: /r/datahoarder by /u/Difficult-Stuff-9800 on 2025-06-06 17:35:15.

I’ve been using GoodSync to back up my files to the cloud, and I’ve enabled encryption for both file content and names. It’s great for security, but I’m worried about what happens if GoodSync terminates its service one day. How would I decrypt my files without their software?

I noticed that if I only encrypt the file content (not the names), I can decrypt those files using 7zip. I’m concerned that GoodSync could change their encryption method in the future, leaving my data inaccessible.

Does Syncbackpro provide all features of goodsync? I noticed that its doc mentioned can be decrypted by zip.

37
 
 
The original post: /r/datahoarder by /u/wade-wei on 2025-06-06 17:17:10.

We have plans to decommission a Dell SCv2080 storage with 8T SAS drives. I am thinking of putting a few of those drives on my own for-fun server, but I heard these Compellent drives may have different firmware. Can I use them directly on Dell R730/740 servers or do I need to reformat from (528b to 512b?), or do I have to flash some sort of normal firmware onto it?

38
 
 
The original post: /r/datahoarder by /u/loglux on 2025-06-06 17:09:21.

Hey DataHoarders! 🗃️

I recently made an open-source tool to batch-download full video courses from Microsoft Learn (MS’s free cloud training platform). If you want to archive courses, watch on your smart TV at home, or just keep a backup for offline use, this might be useful!

🚀 Main features:

  • 🎯 Auto playlist detection: Just paste any two sample URLs and the tool figures out the sequence — no manual link collection needed.
  • 🖥️ GUI and CLI: Download with a user-friendly interface or from the terminal.
  • 💬 Subtitle selection: Choose only the subtitle languages you need (en-us, ru-ru, zh-cn, and more).
  • 📁 Configurable download folder: Organise your archive your way.
  • 📊 Progress tracking: Real-time logs and download status in the GUI.
  • 🆓 100% free and open source: No ads, no accounts, MIT license.

Note: Only works for public, free Microsoft Learn video series (all legit, no scraping of private/paid content).


🔗 GitHub: loglux/LearnVideoDownloader

README includes screenshots, quickstart, and usage examples.


Hope this helps someone else with their learning archive!

If you have suggestions or want to contribute, feel free to open issues or PRs.

Mods: please remove if not appropriate — just sharing a free, open-source resource for the community.

39
 
 
The original post: /r/datahoarder by /u/LordGAD on 2025-06-06 16:40:13.
40
 
 
The original post: /r/datahoarder by /u/Turbulent_Owner on 2025-06-06 10:27:15.

  • Beginner mistake I know, don't use storage space in windows there are better options. I'd rather not restart everything if possible! Nothing super fragile or worth my time just would be a hassle *

I have 10 hard drives that equal around 40 TBs, I just got a new 10 TB hard drive to add to the array but when I put it in and add it to the storage space, win10 displays the wrong amount of space. It says I have only 22 TB Total and only 7 TB free? How can I correct this? I don't have any errors in any of my drives, if I run optimize space between the drives, it runs for a few then just quits without an error (not sure why).

41
 
 
The original post: /r/datahoarder by /u/AshleyAshes1984 on 2025-06-06 14:57:56.
42
 
 
The original post: /r/datahoarder by /u/Dandypleasure on 2025-06-06 15:29:12.

Hi,

I've read that data can be damaged by being stored on an external hard drive. But I don't want to lose ANYTHING.

Do you think that backing up on a CD like in the old days allows you to keep your data without loss ?

What's the solution ?

And stupid question : the data, images, videos etc stored on the internal hard drive of the PC that we use every day, do not decline ? It's only on external hard drives stored in the cupboard ?

I would like someone to explain it all to me

43
 
 
The original post: /r/datahoarder by /u/dirstythirsty on 2025-06-06 15:12:38.

Hey everyone! Just wanted to flag to all the data archivists out there that now would be a good time to start ripping and archiving government media from youtube and other sources.

I’m a (sometimes) freelance journalist currently working on a story about the elimination of federal HIV prevention programs and have noticed a lot of data being scrubbed from federal agency websites. Strangely, it appears that whoever is in charge of deleting info from federal websites forgot that many of these agencies have youtube channels with a ton of insightful discussions and interviews.

I know I can rip and archive these myself but I wanted to flag for the community and anyone interested in preserving publicly funded research and communication to rip as much media as possible before it’s wiped from the internet.

If i’m being histrionic feel free to let me know but I don’t see the harm in tracking down and archiving the myriad digital media output intended to be accessed freely by the public.

44
 
 
The original post: /r/datahoarder by /u/juicysound on 2025-06-06 14:44:28.
45
 
 
The original post: /r/datahoarder by /u/sioux612 on 2025-06-06 13:59:20.

My buddy us currently looking for his first NAS for backing up pictures and stuff at home. Obviously came to me fornquestions, which I'm grateful for but im also not the best informed with modern NAS.

I used to run a qnap 6bay system that I did like, until I had complete data loss from encryption scams like two or three times, and each time it was because of some QNAP app that I didn't even know was installed on my device, so I immediately jumped to a larger server and unraid.

So im kinda burned with qnap and based on the proprietary HDD stuff synology is mostly out for me as well.

Any other brands that I should definitely not use/preferably use?

I dont want to put him on an unraid system immidiately because he is closer to an apple user than a Linux user and I want him to be somewhat autonomous with it

We'll get him a 4 bay one, no heavy lifting just some file storage and photo cloud, I'll fill it with 2 or 4 drives, probably raid 1, and then check out solutions to back up encrypted full images of his nas on my server

46
 
 
The original post: /r/datahoarder by /u/JKAF3 on 2025-06-06 13:51:09.

ok so i have previously asked about using SAS drives in my PC, turns out i need a SAS controller however i dont have the PCIe slot available for one so i thought about something else.

is it possible to get like a mini external enclosure to put 2x sas SSDs in it and put them into a RAID1 then plug that enclosure into my pc to access the storage?

47
 
 
The original post: /r/datahoarder by /u/boo_sneaky on 2025-06-06 13:04:57.

was using mediahuman for a while because it can track playlists on YouTube and Spotify and SoundCloud and others , but I run into parsing issues when I update my playlists sometimes and the authentication wont work correctly sometimes . Im sure theres other effective options , just not sure where to start.

48
 
 
The original post: /r/datahoarder by /u/SlavWithBeard on 2025-06-06 12:41:16.

Hello,

This may not be the best subreddit for my question, but I believe there are people here who might know the answer.

I have a couple of questions about RAID 1 in the Terramaster DAS D2-320:

  1. In theory, RAID 1 is simply mirroring. If there is a hardware failure in the DAS, is it possible to remove a disk and connect it to a PC to access the files normally?
  2. Is it possible to start a RAID 1 configuration with one disk that already has data and then add a second disk later?

Thank you!

49
 
 
The original post: /r/datahoarder by /u/xoskrad on 2025-06-06 11:43:50.

Recently I've had to transfer a lot of data from one NAS to another during the process it cancelled.

Are there any suggestions for a windows app that I can use to compare the two folders to highlight any sub folders which are different and what files are missing from one? Thanks

50
 
 
The original post: /r/datahoarder by /u/TheShatteringSpider on 2025-06-06 05:00:51.

I got a deal to get a NAS for like $50, just the empty nas no storage really good deal. I want to get storage for it to store all my games, vtuber assets, and my recording assets + back up photos for my moms iphone.

I just don't 100% know what type I should get. I plan on buying used and found some 8tb HDDs for cheap on marketplace, but the NAS can run Sata SSDs so would that be better long term?

I plan to do a raid 6 config, and buying 4x4tbs ideally. If i find a deal for more storage somehow used I'd go for that.

view more: ‹ prev next ›