this post was submitted on 20 Dec 2025

403 points (98.8% liked)

Technology

77872 readers

3471 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

403

Backing up Spotify (annas-archive.li)

submitted 2 days ago by JensSpahnpasta@feddit.org to c/technology@lemmy.world

96 comments fedilink hide all child comments

top 50 comments

sorted by: hot top controversial new old

[–] jaschen306@sh.itjust.works 12 points 1 day ago

I guess I gotta donate more to anna

[–] JoeKrogan@lemmy.world 7 points 1 day ago

Dont have the space but love to see this. I hope people seed this for a long time

[–] arcterus@piefed.blahaj.zone 42 points 1 day ago (6 children)

Over-focus on the most popular artists. There is a long tail of music which only gets preserved when a single person cares enough to share it. And such files are often poorly seeded.

We primarily used Spotify’s “popularity” metric to prioritize tracks. View the top 10,000 most popular songs in this HTML file (13.8MB gzipped).

For popularity>0, we got close to all tracks on the platform. The quality is the original OGG Vorbis at 160kbit/s. Metadata was added without reencoding the audio (and an archive of diff files is available to reconstruct the original files from Spotify, as well as a metadata file with original hashes and checksums).

For popularity=0, we got files representing about half the number of listens (either original or a copy with the same ISRC). The audio is reencoded to OGG Opus at 75kbit/s — sounding the same to most people, but noticeable to an expert.

Perhaps I'm reading this wrong, but is this not a little backwards? Since unpopular music is poorly preserved, shouldn't the focus be on getting the least popular music first?

[–] Techlos@lemmy.dbzer0.com 5 points 11 hours ago

If you want that long tail, bandcamp and soundcloud are better sources. The barrier to entry is low with those, and there's a plethora of small, niche artists just doing their own thing.

For a representative snapshot of music though, it's pretty amazing. It shows what a massive percentage of the planet listens to, preserved hopefully across many seeds, and historians will love shit like this in the future.

[–] WolfLink@sh.itjust.works 27 points 1 day ago

Unfortunately if you sort by least popular musicon Spotify, you’ll get nothing but spam

[–] JensSpahnpasta@feddit.org 28 points 1 day ago (2 children)

It depends on what your goal is: If you want to preserve the music that is important to most people or to the era, you should start with the most popular stuff. And Spotify has a big spam problem. Everybody who thinks he is a DJ wants his music to be on there and there is so much AI music flooding the scene. So it does make sense to backup what people are actually listening and not some AI-generated music spam nobody cares about.

[–] mrdown@lemmy.world 0 points 10 hours ago

I am pretty sure the major labels are already preserving the most mainstream artists. Msybe it should be sorting by the most popular independent artists

[–] arcterus@piefed.blahaj.zone 10 points 1 day ago

I mean, they say earlier that music is actually well-preserved, but it's disproportionately popular music. If the goal is then to preserve everything, I'd expect them to go for stuff that isn't likely to be in some random audiophile's collection or whatever then.

[–] UltraMagnus@startrek.website 12 points 1 day ago

The politics of preservation is definitely an interesting one. I suppose one argument in favor of preserving more popular music is that there are going to be fewer popular tracks than unpopular tracks - and they're already at 300TB, which is nothing to sneeze at, especially since it's a third the size of their existing library of ebooks.

[–] thermal_shock@lemmy.world 4 points 1 day ago

I agree. I seed torrents/files that took me a long time to finish.

load more comments (1 replies)

[–] kindred@lemmy.dbzer0.com 94 points 2 days ago (4 children)

This is by far the largest music metadata database that is publicly available. For comparison, we have 256 million tracks, while others have 50-150 million. Our data is well-annotated: MusicBrainz has 5 million unique ISRCs, while our database has 186 million.

Does this mean the MusicBrainz database will soon go from 5 million to 186 million tracks?

[–] zingo@sh.itjust.works 13 points 1 day ago* (last edited 1 day ago)

That's exactly what I was wondering too.

Acquiring high quality music is already easy enough in most cases.

What I am interested in is the metadata. Accurate tagging of all my files is of high interest.

[–] purplemonkeymad@programming.dev 7 points 1 day ago (1 children)

If I ran mb, I would be cautious importing the data directly. I'm sure Spotify would consider it trade information and go after anyone directly using it. However if a few million people added the tracks with individual edits then it probably won't take too long.

[–] Knock_Knock_Lemmy_In@lemmy.world 4 points 1 day ago (1 children)

I thought metadata couldn't be copyrighted though?

[–] zarkony@lemmy.zip 3 points 1 day ago (1 children)

It can't, but I'm sure that wouldn't stop Spotify from raising a stink if they see it being bulk imported. I'd imagine this would be similar to OpenStreetMaps and Google Maps; they probably could scrape and bulk import missing info, but they restrict it to licensed sources and user edits to limit liability and enforce quality.

[–] Knock_Knock_Lemmy_In@lemmy.world 3 points 20 hours ago (1 children)

In cartography the expression of the uncopyrightable data is itself copyrighted (e.g. colors used, thickness of ligns) so maybe certain data fields are owned by Spotify (e.g. genre, description, history, song notes)

[–] noodlejetski@piefed.social 3 points 12 hours ago (1 children)

ligns

that's a fascinating typo.

[–] Knock_Knock_Lemmy_In@lemmy.world 2 points 11 hours ago

You know, I thought it didn't look right, but I typed it into Google and it didn't autocorrect so I went with it.

Rhymes with signs, it must be right.

[–] exu@feditown.com 18 points 1 day ago

Probably not worth it to store the AI tracks

[–] xploit@lemmy.world 29 points 2 days ago

Asking the real questions here...

[–] massive_bereavement@fedia.io 70 points 2 days ago (8 children)

I'll strongly suggest to take out all the cheaply AI generated music from this "back up" and save themselves some space.

[–] AnarchistArtificer@slrpnk.net 19 points 1 day ago (1 children)

I'm not sure how they would go about doing that at scale without also getting some false positives and removing human music too

[–] cheesybuddha@lemmy.world 5 points 1 day ago (1 children)

You could cut off your search around the time AI tracks started to appear. Not sure when that was, maybe 2023. You'd miss a lot of recent stuff, but you'd filter out a lot of spam too

[–] AnarchistArtificer@slrpnk.net 4 points 21 hours ago

I see your point, but as you say, there would still be the tradeoff of missing more recent stuff. That might only involve missing a couple of years' worth of stuff now, but AI isn't going away any time soon, so it would mean that there'd be an increasing amount of human made music not being archived; One of the things I like about Anna's archive is that they seem to look at this problem as a long term, informational infrastructure kind of way, so I imagine they wouldn't be keen on stopping the archive at 2023.

It seems they've opted for a different tradeoff instead: lower popularity songs are archived at a lower bitrate, and even the higher popularity stuff has some compression. Some archives go for quality, and thus prioritise high quality FLACs, so Anna's archive are aiming to fulfill a different niche. I can respect that.

load more comments (7 replies)

[–] helpImTrappedOnline@lemmy.world 54 points 2 days ago* (last edited 2 days ago) (4 children)

The data they compiled is really cool.

If reading the chart right, the genera with the most artists is opera.

Even if they didn't have the music files, the analysis on the metadata is insane.

Publicly admitting they are the origin of the torrents is definitely ~~a risky~~ an insane move. I don't think they want Sony going after them, but also fuck Sony for locking art behind shitty contracts that forces these kind of projects to exist.

[–] JensSpahnpasta@feddit.org 29 points 1 day ago (1 children)

Publicly admitting they are the origin of the torrents is definitely a risky an insane move. I don’t think they want Sony going after them

Let's be honest: Everybody is trying to go after Annas Archive. Every book publisher wants to get them, the US government, too and it really doesn't matter if every music publisher wants them also. I hope that they are based in a country where the western systems can't get them

[–] Tangent5280@lemmy.world 3 points 1 day ago

I hope (also assume since it hasn't been taken down yet) it's more of a decentralised deal with servers in many places and backups in every nation under the sun

[–] mrdown@lemmy.world 4 points 1 day ago

The 3 major labels are equally predatory not only Sony

load more comments (2 replies)

[–] lietuva@lemmy.world 44 points 2 days ago* (last edited 1 day ago) (9 children)

There's definitely gonna be some crazy guy who will put this on their server and stream it to their phones lol

[–] thermal_shock@lemmy.world 7 points 1 day ago

I stream mine through Plexamp. Up to almost 400k tracks.

[–] cheesybuddha@lemmy.world 5 points 1 day ago (1 children)

If I had an extra 300 tb I'd do it.

[–] EpicFailGuy@lemmy.world 2 points 14 hours ago

Tagging /datahoarded

[–] extremeboredom@lemmy.world 22 points 2 days ago

Hi it's me

load more comments (6 replies)

load more comments