I don't have a direct answer to your question, but it reminded me of a Tom Scott video where a library tries to keep a copy of everything you can think of (even stuff like leaflets) because it's not possible to know now what will be relevant/interesting in the future, so it's better to err on the side of keeping more stuff than necessary than to lose things that might be useful in the future. I suck at summarizing, so here's the link to this video:
Asklemmy
A loosely moderated place to ask open-ended questions
Search asklemmy π
If your post meets the following criteria, it's welcome here!
- Open-ended question
- Not offensive: at this point, we do not have the bandwidth to moderate overtly political discussions. Assume best intent and be excellent to each other.
- Not regarding using or support for Lemmy: context, see the list of support communities and tools for finding communities below
- Not ad nauseam inducing: please make sure it is a question that would be new to most members
- An actual topic of discussion
Looking for support?
Looking for a community?
- Lemmyverse: community search
- sub.rehab: maps old subreddits to fediverse options, marks official as such
- !lemmy411@lemmy.ca: a community for finding communities
~Icon~ ~by~ ~@Double_A@discuss.tchncs.de~
Here is an alternative Piped link(s):
https://piped.video/ZNVuIU6UUiM?si=G795TqXyYxFLULbm
Piped is a privacy-respecting open-source alternative frontend to YouTube.
I'm open-source; check me out at GitHub.
Somebody has to make the call. There was this dinosaur book for kids in a little free library. It didnβt even have an author or publisher, because it was AI garbage. Full of misspellings, etc. I contemplated throwing it in the trash because I donβt think it should exist. But for some reason I had trouble deciding that for others.
Digitize and delete? Scan straight to OCR and dump the books. One hard drive can store a lot of books.
- 
Drop accounting and commercial transaction documents. Invoices, receipts, shipping declarations, etc... Sure it could be interesting to future archeologists but it makes the huge majority of all documents generated by humanity. We still have millions of untranslated accounting docs from ancient MesopotΓ mia. It not really that useful. 
- 
Content published in less than 1000 copies, or read/watched less than 1000 times if online. Fanfiction, self published books and so on. Again it's a loss for a couple of niche future historians. You drop a significant % of storage, but even minor works representing our society remain. 
Keep the most popular at the time. Pulp. The weird fan fic you wrote when you were 14, memes. The things that actual people enjoy, there is a lot that someone 1000 years from now can learn reading your weird embarrasing fan fiction and diary.
The rich and powerful are going to preserve all their crap just fine, the normal everyday doesn't seem to get high priority for preservation.
The rich and powerful are going to preserve all their crap just fine, the normal everyday doesn't seem to get high priority for preservation.
I don't know if that is true when we are talking about hundreds if not thousands of years. Money and power can only go so far after you die.
I think mostly it comes down to if you were important to history or not. Even then it comes down to who it telling the history.
I'd use a better compression method and kick the can down the road to the next sucker to take the job.
Train LLMs on large bulks of data that meet criteria for deletion, thereby shrinking like 100 petabytes to a terabyte, albeit imperfectly. That way, you have a collection of AI bots that you can chat with about all the deleted data. And I suppose the threshold for deletion is, "How disastrous could a hallucination about this be?"
Random selection ofc!
Coin flip.
Less time spent deciding what's important, and more time spent improving the service to the point where overpopulation isn't a problem.
By the way, random selection is one of the few methods to guarantee a fair and representatative sample.
In case of a library I'd want at least part of the catalog represent media of the times is was created, because science also uses that.
One reason archaelogy in Pompei is valuable is that everyrhing - including pornograhic graffiti and slander was preserved, giving us a different perspective into those times than books by medieval historians.
Of course, I'd also want - in another part of the catalog - books/media by manually selected scientists, artists, historians.
I would sort according to it's relevance/value to it's genre as well as the redundancy of the information contained.
This is not a good method, but...
I think I'd specify an a number... let's say, 2/3 of the books get to survive, the others will be thrown away. Then I'll start seperating them into three groups; the ones I become immediately sure are great books, the ones I'm sure are trash, and ones I'm unsure about. Then I'll go random: if I chose too many books as good ones, I'll randomly take some of them out, unless I'm sure they are all great books that should stay in the library. Or start randomly adding books from the unsure pile to the ones that get to stay.
Well. Not a great method.
Look for the political association of the author. Right wing? Into the trash it goes.
Harry potter did make books popular for a bit...
Vibes and cover art
Weed out via the tolerance metric.
If a piece of writing is non fiction, and exists to foment hatred, bye-bye. mein kampf is not literature.
It is history though. Erasing it from history would make it harder to study how insane that man was, which is immediately apparent in basically any part of that book. In Germany, Mein Kampf is banned except for educational purposes, eg in history class. Nothing conveys just how bad Hitler was as effectively as his own writing.
In Germany, Mein Kampf is banned except for educational purposes, eg in history class.
Strictly speaking this is incorrect, although the situation is somewhat complicated. There are laws that can be and were used to limit its redistribution (mainly the rule against anti-constitutional propaganda), but there are dissenting judgements saying original prints from before the end of WW2 cannot fall under this, since they are pre-constitutional. One particular reprint from 2018 has been classified as "liable to corrupt the young", but to my knowledge this only means it cannot be publicly advertised.
What is interesting though is how distribution and reprinting was prevented historically, which is copyright. As Hitlers legal heir the state of Bavaria held the copyright until it expired in 2015 and simply didn't grant license to anything except versions with scholarly commentary. But technically since then anybody can print and distribute new copies of the book. If this violates any law will then be determined on a case-by-case basis after the fact.
Something had to go, that's the premise.
Compared to anything else in existence, the shit that man wrote is the least useful thing.