this post was submitted on 30 Oct 2023
1 points (100.0% liked)

Data Hoarder

221 readers
1 users here now

We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time (tm) ). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.

founded 2 years ago
MODERATORS
 

Hi!

I hope this is the right place to ask this, but I've got about 30TB of videos from a variety of sources that I know has numerous duplicates, however most of the duplicates will either have a watermark that is different or one video will be a part of the other or simply be different resolutions, I have been hunting for suitable software to scan for this but have had no luck in my own searches.

surely something exists that scans videos for matching or similar frames?

I am willing to pay for software (within reason) if someone can confirm that it's likely to do what I need.

I've used things like dupeguru and czkawka consistently and they've been good for images and identical videos, but they can't handle this type of issue with videos.

top 3 comments
sorted by: hot top controversial new old
[–] ZackCanada@alien.top 1 points 2 years ago (1 children)

I have same problem but with photos. I transferred about 30000 photos from few iCloud libraries into Synology Photos and there must be hundreds of duplicates in that bunch. Need to clear that mess out.

[–] Tarman183@alien.top 1 points 2 years ago (1 children)

Czkawka is quite good for photos, use "Duplicate files" mode first, then "similar images" and work down the similarity levels one by one. Good for watermarked versions and different resolution Bad for cropped versions, also draws false positives on similar images mode if you have two frames from a video saved as images.

[–] sflesch@alien.top 1 points 2 years ago

I used NoClone for the longest time and loved it. Czkawka has been my go to since NoClone hasn't been updated in a while and tends to crash.

The GUI is a bit awkward to get used to, like for instance I haven't figured out if you can browse a path, and filtering by path is a little different to me, but I'm still generally very happy to use it.