this post was submitted on 30 Nov 2023
4 points (100.0% liked)

Data Hoarder

221 readers
1 users here now

We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time (tm) ). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.

founded 2 years ago
MODERATORS
 

So I run a video production company. We have 300TB of archived projects (and growing daily).

Many years ago, our old solution for archiving was simply to dump old projects off onto an external drive, duplicate that, and have one drive at the office, one offsite elsewhere. This was ok, but not ideal. Relatively expensive per TB, and just a shit ton of physical drives.

A few years ago, we had an unlimited Google Drive and 1000/1000 fibre internet. So we moved to a system where we would drop a project onto an external drive, keep that offsite, and have a duplicate of it uploaded to Google Drive. This worked ok until we reached a hidden file number limit on Google Drive. Then they removed the unlimited sizing of Google Drive accounts completely. So that was a dead end.

So then we moved that system to Dropbox a couple of years ago, as they were offering an unlimited account. This was the perfect situation. Dropbox was feature rich, fast, integrated beautifully into finder/explorer and just a great solution all round. It meant it was easy to give clients access to old data directly if they needed, etc. Anyway, as you all know, that gravy train has come to an end recently, and we now have 12 months grace with out storage on there before we have to have this sorted back to another sytem.

Our options seem to be:

  • Go back to our old system of duplicated external drives, with one living offsite. We'd need ~$7500AUD worth of new drives to duplicate what we currently have.
  • Buy a couple of LTO-9 tape drives (2 offices in different cities) and keep one copy on an external drive and one copy on a tape archive. This would be ~$20000AUD of hardware upfront + media costs of ~$2000AUD (assuming we'd get maybe 30TB per tape on the 18TB raw LTO 9 tapes). So more expensive upfront but would maybe pay off eventually?
  • Build a linustechtips style beast of a NAS. Raw drive cost would be similar to the external drives, but would have the advantage of being accessible remotely. Would then need to spend $5000-10000AUD on the actual hardware on top of the drives. Also have the problem of ever growing storage needs. This solution we could potentially not duplicate the data to external drives though and live with RAID as only form of redundancy...
  • Another clour storage service? Anything fast and decent enough that comes at a reasonable cost?

Any advice here would be appreciated!

(page 2) 36 comments
sorted by: hot top controversial new old
[–] GrimThursday@alien.top 1 points 2 years ago

OP, look into AWS S3 Glacier. Not the normal S3.

In AUD, for a data centre in Sydney, with S3 Glacier Flexible you’re paying around $0.0045 per GB, and with Glacier Deep Archive it’s $0.002 per GB. This is your solution

[–] physx_rt@alien.top 1 points 2 years ago

You need ot think about how often you need to access the data. If it's once or twice a year, then the added overhead of having to find and load a tape wouldn't add up that quickly and IMO should be acceptable.

However, for projects you currently work on, you'd want hard drives and/or SSDs, preferably on a network, I suppose. Unless all your in0flight footage resides on the computers you edit them on (in which case I hope they have redundant storage).

Also, if any of your clients needed some archived data, would it be feasible to come back to the tapes, read, upload and share them? If you had a NAS and a fast enough internet connection, you may be able to host a site yourself, thus no need for reading the tape and uploading to a cloud.

Also, if it's video footage, then you shouldn't really count on LTO's compression ability. It's not particularly good for pictures and videos.

[–] Shubham_Garg123@alien.top 1 points 2 years ago
[–] rgraves22@alien.top 1 points 2 years ago

I still use Google Drive but I won't trust it. My gmail/google presence ive had since gmail was closed beta, including the last 15+ years of photos including my wedding photos, the day my kids were born everything gone.

I got a notification from google one day on my "backup" account that my account had been suspended for breaking TOS but it didnt say what I did.

Its my own fault for not duplicating my data and trusting google would keep it safe for me considering I paid for the 2tb plan of drive.

Never again.

[–] Yugen42@alien.top 1 points 2 years ago

If you need fast and regular access to the archive, anything up to 1 PB can be handled with HDDs nowadays. If you dont need that, LTO tape will be much cheaper. For your offsite backup encryption+archival storage such as GCP coldline or archival storage is very cost effective and can be combined with either. Think about your data and organization. Perhaps you only need fast access to a part of the data, so combining the two might be the best solution. Consider if you have an IT department or a data steward to set up a system for organizing that data.

[–] Molecule_Man@alien.top 1 points 2 years ago

Have you looked at any MAMs?

Most will combine on-premise and multiple cloud storages and then proxy a low res for previewing, with custom metadata modeling to find and retrieve everything.

Sounds like what you need.

[–] CanuckFire@alien.top 1 points 2 years ago

You are basically taking on customer data archiving as a part of your business. If you are doing this as a business, everything has a cost and that cost should be passed to the customers.

There is a reason that companies doing long term record retention charge absurd amounts for it.... Iron Mountain takes on a ton of liability and responsibility to keep your crap intact while they have it. I would never take that on willingly.

Some people I know offer to package and transfer the assets to the customer as a paid service when the project is done, making long term storage their problem. (Photo, design)

I also have friends who do the contract line that assets are only kept for a year.

Truthfully... As a business, why would you want to keep anything? If the customers lose their data they need to pay you to make thing again which is better for you.

[–] Brilliant_Eagle9795@alien.top 1 points 2 years ago

Definitely tape.

[–] CFH75@alien.top 1 points 2 years ago

I have 400TB stored in our Wasabi archive and we use a Purestorage nas for live data. In your case I would look at a Synology.

[–] Haz3rd@alien.top 1 points 2 years ago

I'm in the exact same boat as you. Cloud storage is completely unusable for people like us.

I've gone back and forth between what I'd want to setup and what would be most effective. The solution I've settled on is building a cheap NAS out of a normal PC, throw as many drives as I possibly can in it and store that at someone's house. It sucks that that's the best solution, but for us and our budget and use case, that's the best for us. Granted you have more data than us so it probably wouldn't all fit in a normal case, but you get the idea

[–] johnklos@alien.top 1 points 2 years ago

Tape. You'll thank yourself in the long run.

[–] theiman69@alien.top 1 points 2 years ago

Seagate Exos X systems, a 2U12 with 20 TB drives will last you 5-7 years

[–] No_Sense3190@alien.top 1 points 2 years ago

I've worked for several production companies that have similar or larger archives (one was well into the Petabyte range). LTO is the way to go. It is the cheapest option for very large archives, and if the tapes are properly stored, they last a lot longer than hard drives sitting on a shelf.

The real way to do it is a tiered archive, where everything goes to LTO, you have more recent media (1-2 years old, depending on project length) on hard drives, and current media (still in use + past year or so) on a NAS. LTO is still your primary archive; everything else is for easy access to media you're more likely to need now or in the near future.

[–] Sk1tza@alien.top 1 points 2 years ago

Have you looked at Wasabi or Backblaze? Possibly cheaper. It’s always cheaper to do this via a nas in the end. Big Synology or two smaller units at each site with expandability.

[–] pastelpalettegroove@alien.top 1 points 2 years ago

I hope you charge your clients for archiving purpose. I work in a similar field as yours and no chance I'm archiving this much data if the clients aren't paying for this. I have a contract that stipulates assets are kept for 1 year then they pay a yearly fee for archiving or they agree that we may delete it.

[–] Riftbreaker@alien.top 1 points 2 years ago

Please keep us posted on what you decide. I am facing almost the exact same problem.

[–] Spare-Appeal4422@alien.top 1 points 2 years ago

IMO it depends on how organized you are and how often you need to access archived video.

LTO-9 is cheaper per TB (haven’t run the numbers, but on the order of 100s of TB it’s almost definitely true) but relies on someone physically finding the right tape and putting it into the system (unless you shell out for a very expensive automated system). Not good for fast access, but cheaper for expanding.

If you need fast, automated access I’d recommend the NAS option, but keep in mind that it would be in one physical location. A fire or flood and you’re fucked.

Plus, since the cost per TB of tape is so much cheaper than HDD, expanding your archive is probably much cheaper with tape (keeping in mind the organization/automation aspect)

[–] hdmiusbc@alien.top 1 points 2 years ago (4 children)
load more comments (4 replies)
[–] AcanthocephalaTrue24@alien.top 1 points 2 years ago

Listen to me. Here is the pro solution. Get yourself something like Fujitsu Eternus cs800 plus Fujitsu lto tape library. Contact sales team and tell them how many data are you going to put there. Result will be all the data available quick if they reside on disk cache, or little bit later if need to be pulled from tape. From your point of view data will be available from mounted network share and transparent in terms of technical magic behind it. Basically - imagine yourself an infinite folder where algorithm is moving data to and from tapes, keeps them healthy, refresh and consolidate when needed. 20 tapes each 12tb plus dedup is like 0.5 PB of data. And you can always duplicate tapes and move to external location. Even if somebody would stole everything from 1st location including hardware, you can get data back.

[–] MikeFromTheVineyard@alien.top 1 points 2 years ago

A lot of people have suggested charging clients for long-term storage. I agree with that sentiment. If you go this route, you may be able to use cloud storage a la Dropbox/gDrive - which seems most convenient for you. Costs for consumer-facing cloud storage run roughly $10USD for 2TB. Expensive for hundreds of terabytes indefinitely, but if a single client needs access to (idk) 0.5 tb you could easily charge $30-50 a year to provide them a shared folder in google drive. Maybe more if you want redundancy against the cloud provider losing data.

For anything you need to actively use for work, a giant NAS is probably your best bet. Those YouTubers you’ve seen also use it as part of their team workflow, and maybe that’d also apply to you anyways. You should probably run a regular backup job of these to the other office or to AWS/backblaze. Should be manageable cost if you only need 10-20TB of data for active work.

For everything else… maybe tape if you really want to keep everything. A lot of big organizations seem to be moving away from tape towards networked spinning disk as the price drops. Seems mostly driven by tape being seen as a massive pain to use (not that I have personal experience with it) and expensive equipment. It’s really an organizational decision to directly quantify long term archival needs and value. Once you have a $/TB value to the business, see what fits your budget (could be nothing!) You could try Backblaze or AWS glacier but those get expensive and the cost is ongoing forever.

There are a whole bunch of niche and small-scale companies doing cloud data storage, but I don’t know how they’d get lower cost per byte stored over some big companies (lower margins? Slower speeds? Lower guarantees?). I’d be suspicious of them for mission-critical storage. It’s one thing for a home-user to use them to store their torrented movies, but it’s very different for a business. It could be worth it to just search around. Look at what’s supported as a target by whatever NAS software you use if that’s your route.

[–] Icy-Goose4703@alien.top 1 points 2 years ago

look at IDrive e2, no egress, $4,500 first year...screaming deal

https://www.idrive.com/object-storage-e2/pricing

[–] campster123@alien.top 1 points 2 years ago

I just want to say a massive thank you to everyone contributing advice and thoughts here. There’s a lot to get through and I’m taking my it all in.

To those saying we should be charging for this, we hear you, you’re not the first to tell us. We’re looking into implementing that going forward and need to assess how we’ll tackle that for older clients.

I feel like this is a good point to assess our whole data infrastructure (live edits and archiving) and we’ll keep you all up to date once we decide on a direction. In the meantime keep the thoughts rolling in!

load more comments
view more: ‹ prev next ›