this post was submitted on 22 May 2025
415 points (97.1% liked)

memes

14883 readers
5994 users here now

Community rules

1. Be civilNo trolling, bigotry or other insulting / annoying behaviour

2. No politicsThis is non-politics community. For political memes please go to !politicalmemes@lemmy.world

3. No recent repostsCheck for reposts when posting a meme, you can only repost after 1 month

4. No botsNo bots without the express approval of the mods or the admins

5. No Spam/AdsNo advertisements or spam. This is an instance rule and the only way to live.

A collection of some classic Lemmy memes for your enjoyment

Sister communities

founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[โ€“] cogman@lemmy.world 24 points 15 hours ago (1 children)

Slow? Not necessarily.

The main issue with that much memory is the data routing and the physical locality of the memory. Assuming you (somehow) could shrink down the distance from the cache to the registers and could have a wide enough data line/request lines you can have data from such a cache in ~4 cycles (assuming L1 and a hit).

What slows down memory for L2 is the wider address space and slower residence checks. L3 gets a bit slower because of even wider address spaces but also it has to deal with concurrency issues since it's shared among cores. It also ends up being slower because it physically has to be further away from the cores due to it's size.

If you ever look at a CPU die, you'll see that L1 caches are generally tiny and embedded right into the center of the processor. L2 tends to be bolted onto the sides of the physical cores. And L3 tends to be the largest amount of silicon real estate on a CPU package. This is all what contributes to the increasing fetch performance for each layer along with the fact that you have to check the closest layers first (An L3 hit, for example, means that the CPU checked L1 and L2 and failed at both which takes time. So L3 access will always be at least the L1 + L2 times).

[โ€“] Smoolak@lemmy.world 4 points 14 hours ago

I agree. When evaluating cache access latency, it is important to consider the entire read path rather than just the intrinsic access time of a single SRAM cell. Much of the latency arises from all the supporting operations required for a functioning cache, such as tag lookups, address decoding, and bitline traversal. As you pointed out, implementing an 8 GB SRAM cache on-die using current manufacturing technology would be extremely impractical. The physical size would lead to substantial wire delays and increased complexity in the indexing and associativity circuits. As a result, the access latency of such a large on-chip cache could actually exceed that of off-chip DRAM, which would defeat the main purpose of having on-die caches in the first place.