Programming

25806 readers

239 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Follow the programming.dev instance rules
Keep content related to programming in some way
If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev

founded 2 years ago

MODERATORS

snowe@programming.dev

Ategon@programming.dev

UlrikHD@programming.dev

bugsmith@programming.dev

Spyro@programming.dev

How to do a collaborative API cache? (piefed.zip)

submitted 1 day ago by MindfulMaverick@piefed.zip to c/programming@programming.dev

4 comments fedilink hide all child comments

I'm looking for advice on building a collaborative caching system for APIs with strict rate limits that automatically commits updates to Git, allowing multiple users to share the scraping load and reduce server strain. The idea is to maintain a local dataset where each piece of data has a timestamp, and when anyone runs the script, it only fetches records older than a configurable threshold from the API, while serving everything else from the local cache. After fetching new data, the script would automatically commit changes to a shared Git repository, so subsequent users benefit from the updated cache without hitting the server. This way, the same task that would take days for one person could be completed in seconds by the next. Has anyone built something like this or know of existing tools/frameworks that support automated Git commits for collaborative data collection with timestamp-based incremental updates?

top 4 comments

sorted by: hot top controversial new old

[–] TehPers@beehaw.org 2 points 16 hours ago (1 children)

Can't speak for Git, but caching responses is a common enough problem that it's built into the standard HTTP headers.

As for building a cache, you'd want to know a few things:

What is a cache entry? In your case, seems to be an API response.
How long do cache entries live? Do they live for a fixed time (TTL cache)? Do you have a max number of cached entries before you evict entries to make space? How do you determine which entries to evict if so?
What will store the cache entries? It seems like you chose Git, but I don't see any reason you couldn't start simple just by using the filesystem (and depending on the complexity, optionally a SQL DB).

You seem locked into using Git, and if that's the case, you still need to consider the second point there. Do you plan to evict cache entries? Git repos can grow unbounded in size, and it doesn't give you many options for determining what entries to keep.

[–] MindfulMaverick@piefed.zip 1 points 13 hours ago

If I used sqlite or any other SQL database I don't think users could collaborate on building the database, so I was thinking of json files committed to a git repository online.

[–] derek 4 points 1 day ago

What problem are you attempting to solve?

How to you imagine this solution solves that problem?

[–] bricked@feddit.org 4 points 1 day ago

This could easily be implemented with an SQL-like database. Are you sure you want to use Git for this? The only advantage would be that you get historic data out of the box, but you'll probably only fetch the latest data anyways