It's A Digital Disease!

23 readers
1 users here now

This is a sub that aims at bringing data hoarders together to share their passion with like minded people.

founded 2 years ago
MODERATORS
1176
 
 
The original post: /r/datahoarder by /u/GuyWhoDiesIn2025_ on 2025-06-03 07:40:05.

i want to upload a video onto internet achieve automatically. I won't be able to do it myself because it's going to be my death video (I have cancer & I qualify for the California End of Life Option Act & I'll be legally taking pills prescribed to me to end my life BUT I WANT TO DO IT ON VIDEO & I WANT IT TO STAY ONLINE FOREVER!)

Is there a way I could automatically have it uploaded onto internet archive? I'm going to be recording it with OBS & I already figured out how to get OBS to automatically stop recording after a certain amount of time, but im trying to figure out how to automatically have it uploaded onto internet archive.

Can someone please tell me how to do that? Does anyone know of a tutorial or something?

1177
 
 
The original post: /r/datahoarder by /u/Azadi_23 on 2025-06-03 21:56:51.

Any advice welcome!

  1. I would like to get all my old files onto one external storage solution from several old hard drives - what’s a good brand/make/model for around 2TB - 4TB? Which ones to avoid? I bought cheap large USBs that worked briefly and then became corrupted so I don’t want to make the same mistake twice!
  2. My newer laptop has a faster processor and can move files very efficiently but cannot read/write from my old external hard drives. My old laptop can access the old drive but is very slow and may crash if I try to put too much on it to transfer from old external HD to new external HD. Any tips?
  3. How can I be sure old storage drives are empty of my data? Once I have transferred everything I will delete all files and would be happy to recycle parts if possible. Is there a recommended safety method to be sure my old files are unrecoverable? They’re mostly photos, videos, songs and work/uni text files/PDFs.
1178
 
 
The original post: /r/datahoarder by /u/Endawmyke on 2025-06-03 21:37:46.

I have some of these large thermal paper photos from Chuck E. Cheese’s from like 20+ years ago that I’m wanting to scan.

But I have a bad memory from childhood when I tried to scan a NASCAR ticket as a kid and it totally ruined the ticket. I’m guessing the heat of the scanner light was enough to black out the whole thing.

And seeing as the Chuck E. Cheese photos are also thermal paper I’m worried running it through the scanner will black it out in the same way.

Any advice?

I’m using an Epson FastFoto FF-680W btw, and it’s advertised to work with receipts (which I believe are also thermal paper?) but I just wanna make sure with anyone here experienced so I don’t accidentally kill these photos.

1179
 
 
The original post: /r/datahoarder by /u/CoffeeTrashed on 2025-06-03 21:17:20.

Hi Folks!

Hopefully I'm posting this in the right sub, apologies if not. Basically, I currently have a very very low tech Plex server running in my apartment (Dell 3240 Compact running Debian with 12TB of external dumb storage) and would like to expand this to be a little more all encompassing.

I'd like to have a database setup that contains my Plex Server stuff (How hard would it be to swap to Jellyfin?), all of my books, music, and a bunch of informational YouTube videos that I've downloaded (example: https://www.youtube.com/watch?v=Et5PPMYuOc8). My goal is to have it setup so that all of these things are accessible via any device on my local network, even if my internet is down.

Optionally, I'm also interested in a front end that maybe brings a lot of this together and makes it searchable and looking nicer? I know Plex can technically handle the music and audiobooks, but I don't love the way it handles it. I'm not opposed to just navigating a regular file system type thing for that stuff, but if you guys know of anything that would accomplish that I'm all ears! Thanks!

PC: Dell Precision 3240 i9 w/ 64GB DDR4 RAM

External Storage - https://www.amazon.com/dp/B01MRSRQLA

PS - Just had this thought, is it difficult to scan paper books into PDFs? Maybe that's overkill

1180
 
 
The original post: /r/datahoarder by /u/Vaemorn on 2025-06-03 21:14:30.
1181
 
 
The original post: /r/datahoarder by /u/ConfusedHomelabber on 2025-06-03 20:58:15.

Just wrapped up setting up my NAS. Had to work with a mix of different sized drives, so each one ended up being its own share. Not ideal, but it works for now.

I was planning on doing the usual layout—Documents, Photos, Music, etc.—but after seeing a few screenshots floating around here, I realized there’s a lot of different approaches people take to organizing their data.

So now I’m curious: what does your file structure look like? How do you handle multiple shares or drives with different capacities? Would love to hear what works for you and why

1182
 
 
The original post: /r/datahoarder by /u/Harry_Yudiputa on 2025-06-03 19:50:59.
1183
 
 
The original post: /r/datahoarder by /u/SummorumPontificum90 on 2025-06-03 17:32:10.

What do you think about exchanging disk space with a friend or a complete stranger as an offsite backup? Is this a thing?? Why or why not??

Obviously this backup should be encrypted. It would not be hard to find someone who is interested in such thing in a community like this one.

Let’s make an hypotetic example: I let you store a 4 TB encrypted backup in my NAS and you let me do the same thing (and same disk space) on your NAS.

1184
 
 
The original post: /r/datahoarder by /u/SureElk6 on 2025-06-03 16:37:40.
1185
 
 
The original post: /r/datahoarder by /u/zombik327 on 2025-06-03 16:34:52.
1186
 
 
The original post: /r/datahoarder by /u/RoachedCoach on 2025-06-03 15:22:15.
1187
 
 
The original post: /r/datahoarder by /u/DiskBytes on 2025-06-03 14:48:54.

I've got an issue with mbuffer which has never happened to me before. Basically, the data out is going to tape quicker than it can go in, causing the tape to stop, wait for the buffer to fill, then start again.

But mbuffer is supposed to prevent this from happening, very strange as it has always worked well prior to today and I can't see what I'm doing differently.

As I always have, I'm using tar -b 2048 --directory"name" -cvf - ./ | mbuffer -m 6G -L -P 80 -f -o /dev/st0

Any ideas? Thanks.

1188
 
 
The original post: /r/datahoarder by /u/vff on 2025-06-03 14:18:52.

You may be familiar with the book A Million Random Digits with 100,000 Normal Deviates from the RAND corporation that was used throughout the 20th century as essentially the canonical source of random numbers.

I’m working towards putting together a similar collection, not of one million random decimal digits, but of at least one quadrillion random binary digits (so 128 terabytes). Truly random numbers, not pseudorandom ones. As an example, one source I’ve been using is video noise from an old USB webcam (a Raspberry Pi Zero with a Pi NoIR camera) in a black box, with every two bits fed into a Von Neumann extractor.

I want to save everything because randomness is by its very nature ephemeral. By storing randomness, this gives permanence to ephemerality.

What I’m wondering is how people sort, store, and organize random numbers.

Current organization

I’m trying to keep this all neatly organized rather than just having one big 128TB file. What I’ve been doing is saving them in 128KB chunks (1 million bits) and naming them “random-values/000/000/000.random” (in a zfs dataset “random-values”) and increasing that number each time I generate a new chunk (so each folder level has at most 1,000 files/subdirectories). I’ve found 1,000 is a decent limit that works across different filesystems; much larger and I’ve seen performance problems. I want this to be usable on a variety of platforms.

Then, in separate zfs dataset, “random-metadata,” I also store metadata as the same filename but with different extensions, such as “random-metadata/000/000/000.sha512” (and 000.gen-info.txt and so on). Yes, I know this could go in a database instead. But that makes sharing this all hugely more difficult. To share a SQL database properly requires the same software, replication, etc. So there’s a pragmatic aspect here. I can import the text data into a database at any time if I want to analyze things.

I am open to suggestions if anyone has any better ideas on this. There is an implied ordering to the blocks, by numbering them in this way, but since I’m storying them in generated order at least it should be random. (Emphasis on should.)

Other ideas I explored

Just as an example of another way to organize this, an idea I had but decided against was to randomly generate a numeric filename instead, using a large enough number of truly random bits to minimize the chances of collisions. In the end, I didn’t see any advantage to this over temporal ordering, since such random names could always be applied after-the-fact instead by taking any chunk as a master index and “renaming” the files based on the values in that chunk. Alternatively, if I wanted to select chunks at random, I could always choose one chunk as an “index”, take each N bits of that as a number, and look up whatever chunk has that index.

What I do want to do in the naming is avoid accidentally introducing bias in the organizational structure. As an example, breaking the random numbers into chunks, then sorting those chunks by the values of the chunks as binary numbers, would be a bad idea. So any kind of sorting is out, and to that end even naming files with their SHA-512 hash introduces an implied order, as they become “sorted” by the properties of the hash. We think of SHA-512 as being cryptographically secure, but it’s not truly “random.”

Validation

Now, as an aside, there is also the question of how to validate the randomness, although this is outside the scope of data hoarding. I’ve been validating the data, as it comes in, in those 128KB chunks. Basically, I take the last 1,048,576 bits as a 128KB binary string and use various functions from the TestU01 library to validate its randomness, always going once forwards and once backwards, as TestU01 is more sensitive to the lower bits in each 32-bit chunk. I then store the results as metadata for each chunk, 000.testu01.txt.

An earlier thought was to try compressing the data with zstd, and reject data that compressed, figuring that meant it wasn’t random. I realized that was naive since random data may in fact have a big string of 0’s or some repeating pattern occasionally, so I switched to TestU01.

Questions

I am not married to how I am doing any of this. It works, but I am pretty sure I’m not doing it optimally. Even 1,000 files in a folder is a lot, although it seems OK so far with zfs. But storing as one big 128TB file would make it far too hard to manage.

I’d love feedback. I am open to new ideas.

For those of you who store random numbers, how do you organize them? And, if you have more random numbers than you have space, how do you decide which random numbers to get rid of? Obviously, none of this can be compressed, so deletion is the only way, but the problem is that once these numbers are deleted, they really are gone forever. There is absolutely no way to ever get them back.

(I’m also open to thoughts on the other aspects of this outside of the data hoarding and organizational aspects, although those may not exactly be on-topic for this subreddit and would probably make more sense to be discussed elsewhere.)


TLDR

I’m generating and hoarding ~128TB of (hopefully) truly random bits. I chunk them into 128KB files and use hierarchical naming to keep things organized and portable. I store per-chunk metadata in a parallel ZFS dataset. I am open to critiques on my organizational structure, metadata handling, efficiency, validation, and strategies for deletion when space runs out.

1189
 
 
The original post: /r/datahoarder by /u/pratyathedon on 2025-06-03 13:16:44.

I’m in the market for two large-capacity internal drives (16TB–20TB) to use in my home server/Unraid setup.

I’ve been digging through specs and price lists, but I wanted to get some community input before pulling the trigger.

The thing is I am not from the US, but will be visiting PA in July, I would like to place an order in the next 2 weeks. SPD seems to be the go-to place where y'all buy HDDs with fewer issues.

May main use case is for storing media and use that for jellyfin, I found several recertified Seagate on SPD that are within my budget. Can someone help me with what drives are the safest bet cause i wont be able to test it till i get back to my home.

ST16000NM002C at 210$ FR

ST20000NM002C at 250$ FR

Or if you think there are better options please help me out.

1190
 
 
The original post: /r/datahoarder by /u/der_pudel on 2025-06-03 12:30:34.

After nearly losing a significant portion of my personal data in a PC upgrade that went wrong (gladly recovered everything), I finally decided to implement proper-ish 3-2-1 strategy backups.

My goal is to have an inexpensive (in the sense that I'd like to pay for what I'm actually going to use), maintainable and upgradeable setup. The data I'm going to back up is are mostly photos, videos and other heavy media content with nostalgic value, and personal projects that are not easy to manage in git (hobby CAD projects, proto/video editing, etc.).

Setup I came up with so far:

    1. On PC side, backups are handled by Duplicati. Not sure how stable/reliable it is long term, but my first impression from it is very positive.
    1. Backups are pushed to SFTP server hosted by Raspberry Pi with Radxa SATA Hat and 4x1TB SSD in RAID5 configuration (mdadm).
    1. On Raspberry Pi, I made a service that watches for a special file pushed by Duplicati post operation script and sync the contents of the SFTP to AWS S3 bucket (S3 Standard-Infrequent Access tier).

Since this is the first time I'm building something like that, I'd like to sanity-check the setup before I fully commit to it. Any reasons why it may not work in the long term (5-10 years)? Any better ways to achieve similar functionality without corporate black-box solutions such as Synology?

1191
 
 
The original post: /r/datahoarder by /u/Ok_Crazy6440 on 2025-06-03 11:23:32.

Hey DataHoarders! Bit of an oddball situation: My uncle’s old iPhone is stuck behind the iCloud Activation Lock, and we can’t get in (email’s long gone, and no luck with password recovery). We’re not trying to bypass the lock to use the phone just want to see if there’s any chance of pulling photos or voicemails off it.

Most recovery software I’ve seen just quits entirely when it hits an Activation Lock, but I’m curious if anyone here has tried using Gbyte Recovery (or anything similar) in this situation? Does Gbyte actually try to dig into the locked data, or is that just marketing talk?

I know it’s a long shot, but figured if anyone knows how to get data off an Activation Locked iPhone, it’s someone in here. Appreciate any thoughts or real-world results!

1192
 
 
The original post: /r/datahoarder by /u/Illustrious_Crab_146 on 2025-06-03 10:47:25.
1193
 
 
The original post: /r/datahoarder by /u/umataro on 2025-06-03 09:47:54.

I usually delete my files from USB flash drives, SD cards and hard disks with shred -n 1 -u * if they can't be encrypted but this adds too much wear to flimsy media like SD cards. I would like to be able to just corrupt important headers and insert random data at reasonable intervals to simply make the files unusable before they get unlink-ed. Is there such a thing?

1194
 
 
The original post: /r/datahoarder by /u/tzfld on 2025-06-03 07:47:54.
1195
 
 
The original post: /r/datahoarder by /u/onelonedatum on 2025-06-03 06:08:35.

Here are a few prompt-driven assistants to generate fully verified yt-dlp commands I recently created.

Paste your video/audio URL, answer a few quick prompts (video vs audio, MP4 vs MKV, subs external or embedded, custom output path), and get back a copy-paste CLI snippet validated against the latest yt-dlp docs (FFmpeg required for embedding metadata/subs).

Try them here:


happy to make tweaks as needed, share the underlying prompts, and/or help w/ usage -- just let me know! 🤖 🚀

1196
 
 
The original post: /r/datahoarder by /u/bobwin770 on 2025-06-03 05:53:15.

I have 15 years worth of photos, roughly 10TB of RAW photos. I’m thinking of uploading all RAWS to Amazon Photos as they offer unlimited storage. However Amazon Photos does not allow you to create folders, only albums and ideally I would like images grouped within folders such as Events, Commercial, Personal, etc. This is how I have all my images saved on my external hard drives.

Seperate to this I would like to be able to send work to clients as reference and quickly access images for Instagram posts. For this I was thinking of creating a lower res 2mb per image jpeg version of each folder and uploading these to OneDrive which has a proper folder system making it easier to locate quickly and no need for every photo to be its full RAW size for sending to clients or posting on instagram.

Does anyone have a better solution to this or currently do something similar? Any help would be greatly appreciated

1197
 
 
The original post: /r/datahoarder by /u/Such-Bench-3199 on 2025-06-03 04:29:01.

I am unsure how many others would take this news, but for those of us who archive everything, especially on Mac, get Podcast Archiver from the app store and get all of WTF now before it is gone.

1198
 
 
The original post: /r/datahoarder by /u/svper-user on 2025-06-03 03:46:39.

After discovering BTRFS, I was amazed by its capabilities. So I started using it on all my systems and backups. That was almost a year ago.

Today I was researching small "UPS" with 18650 batteries and I saw posts about BTRFS being very dangerous in terms of power outages.

How much should I worry about this? I'm afraid that a power outage will cause me to lose two of my backups on my server. The third backup is disconnected from the power, but only has the most important part.

EDIT: I was thinking about it before I went to sleep. I have one of those Chinese emulation handhelds and its first firmware version used some FAT or ext. It was very easy to corrupt the file system if it wasn't shut down properly. They implemented btrfs to solve this and now I can shut it down any way I want, directly from the power supply and it never corrupts the system. That made me feel more at ease.

1199
 
 
The original post: /r/datahoarder by /u/waby-saby on 2025-06-03 03:19:20.

I need to have a professional level file hosting service. Preferably something that is SOX and HIPAA compliant, but that's a nice to have.

What is required is limiting files to certain people or groups and the ability to track who downloads what.

A simple interface that is branded is needed. Is like a way to have the ability to share a file simply with a link for occasional files.

This should not be based on per user as that will fluctuate greatly.

Any ideas?

1200
 
 
The original post: /r/datahoarder by /u/ThirdWaveK on 2025-06-03 02:31:46.

Looking for suggestions on ways to add other forms of media, preferably free or open source, that can be downloaded so it could be completely offline. Best way to maximize storage through different audio/video formats? The overall goal is to have a portable ecosystem that could theoretically run on any hardware from the past, say, 20 years or so.

I’m new here, but excited about the prospects. Thanks for any help and input guys!

view more: ‹ prev next ›