this post was submitted on 01 Jan 2026

96 points (100.0% liked)

Programming

24153 readers

399 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Follow the programming.dev instance rules
Keep content related to programming in some way
If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev

founded 2 years ago

MODERATORS

snowe@programming.dev

Ategon@programming.dev

MaungaHikoi@lemmy.nz

UlrikHD@programming.dev

What are some cool and obscure data structure you know of? (programming.dev)

submitted 14 hours ago by protein@programming.dev to c/programming@programming.dev

34 comments fedilink hide all child comments

top 34 comments

sorted by: hot top controversial new old

[–] myfavouritename@beehaw.org 4 points 3 hours ago

I get way more use out of Doubly Connected Edge Lists (DCEL) than I ever thought I would when I first learned about them in school.

When I want to render simple stuff to the screen, built-in functions like 'circle' or 'line' work. But for any shapes more complicated than that, I often find that it's useful to work with the data in DCEL form.

[–] Gobbel2000@programming.dev 3 points 4 hours ago

The CSR (compressed sparse row) format is a very simple but efficient way of storing sparse matrices, meaning matrices with a large amount of zero entries, which should not all occupy memory. It has three arrays: one holds all non-zero entries in order, read row by row, the next array contains the column indices of each non-zero element (and therefore has the same length as the first array), the third array indices into the first array for the first element of each row, so we can tell where a new row starts.

On sparse matrices it has optimal memory efficiency and fast lookups, the main downside is that adding or removing elements from the matrix requires shifting all three arrays, so it is mostly useful for immutable data.

[–] xthexder@l.sw0.com 7 points 5 hours ago* (last edited 5 hours ago)

I came up with a kind of clever data type for storing short strings in a fixed size struct so they can be stored on the stack or inline without any allocations.
It's always null-terminated so it can be passed directly as a C-style string, but it also stores the string length without using any additional data (Getting the length would normally have to iterate to find the end).
The trick is to store the number of unused bytes in the last character of the buffer. When the string is full, there are 0 unused bytes and the size byte overlaps the null terminator.
(Only works for strings < 256 chars excluding null byte)

Implementation in C++ here: https://github.com/frustra/strayphotons/blob/master/src/common/common/InlineString.hh

[–] solomonschuler@lemmy.zip 3 points 4 hours ago

skiplists are interesting data structures. The underlying mechanism is it's a 2-dimensional probabilistic linked list with some associated height 'h' that enables skipping of nodes through key-value pairs. So, compared to a traditional linked list that uses a traversal method to search through all values stored. A skip list starts from the maxLevel/maxheight, determines if "next" points to a key greater than the key provided or a nullptr, and moves down to the level below it if it is. This reduces the time complexity from O(1) with a linked list to O(N) where N Is the maxLevel.

The reason behind why its probabilistic (in this case using a pseudo random number) is because its easier to insert and remove elements, otherwise (if you went with the idealized theoretical form) you would have to reconstruct the entire data structure each and every time you want to add/remove elements.

In my testing when adding 1,000,000 elements to a skiplist it reduced from 6s search with a linked list to less than 1s!

[–] Vorpal@programming.dev 12 points 7 hours ago* (last edited 2 hours ago)

XOR lists are obscure and cursed but cool. And not useful on modern hardware as the CPU can't predict access patterns. They date from a time when every byte of memory counted and CPUs didn't have pipelines.

(In general, all linked lists or trees are terrible for performance on modern CPUs. Prefer vectors or btrees with large fanout factors. There are some niche use cases still for linked lists in for example kernels, but unless you know exactly what you are doing you shouldn't use linked data structures.)

EDIT: Fixed spelling

[–] palordrolap@fedia.io 5 points 7 hours ago

An ultimately doomed one that existed in Perl for a while was the pseudohash. They were regular integer-indexed arrays that could be accessed as though they were hashes (aka associative arrays / dictionaries). They even made it into the main Perl books at the time as this awesome time saving device. Except they weren't.

I did a quick web search just now and someone did a talk about why they weren't a great idea and they tell it better than I could; Link: https://perl.plover.com/classes/pseudohashes/

The supplied video doesn't have great sound quality, and it might be better to just click through the slides under Outline at the bottom there.

[–] duckythescientist@sh.itjust.works 59 points 14 hours ago (2 children)

I'm also not sure if this is obscure, but Bloom Filters! It's a structure that you can add elements to then ask it if it has seen the element before with the answer being either "no" or "probably yes". There's a trade-off between confidence of a "probably yes", how many elements you expect to add, and how big the Bloom Filter is, but it's very space and time efficient. And it uses hash functions which always make for a fun time.

[–] sukhmel@programming.dev 15 points 9 hours ago

Relevant xkcd

in Randall's words

Sometimes, you can tell Bloom filters are the wrong tool for the job, but when they're the right one you can never be sure.

[–] FizzyOrange@programming.dev 8 points 11 hours ago

Obscure 10 years ago maybe. These days there have been so many articles about them I bet they're more widely known than more useful and standard things like prefix trees (aka tries).

[–] xep@discuss.online 4 points 8 hours ago

How about this variation of linked lists? https://www.data-structures-in-practice.com/intrusive-linked-lists/

[–] felsiq@piefed.zip 32 points 14 hours ago (3 children)

Conflict free replicated data types, I don’t know if I’d call them obscure but they’re definitely cool and less often used. They’re for shared state across computers, like in collaborative apps

[–] Pissmidget@lemmy.world 2 points 6 hours ago

From just the name my mind instantly thought of the conflict as "conflict diamonds", and I began to wonder what constitutes a conflict free boolean or integer.

If anyone wants to take a crack at writing up why primitives are unfortunate, and we should move on to new "conflict free data types"™ I will cheer you on!

Also, very interesting read about actual conflict free replicated days types. Cheers!

[–] BackgrndNoize@lemmy.world 1 points 5 hours ago

This sounds like document collaboration software like Google sheets where multiple people can edit a document at the same time

[–] tyler@programming.dev 10 points 13 hours ago

They were pretty obscure until recently! I would say most people still don’t know about them.

[–] marlinz@sueden.social 19 points 14 hours ago (1 children)

@protein

Finger Tree!

A persistent, purely functional workhorse. Amortized O(1) access at both ends, O(log n) concatenation/splitting.

It generalizes elegantly to build sequences, priority queues, and more. Powers Haskell's main Data.Sequence. A functional programmer's secret weapon.

[–] Vorpal@programming.dev 4 points 7 hours ago (2 children)

On paper they are efficient. In practise, all pointer based data structures (linked lists, binary trees, etc) are slow on modern hardware. And this effect is more important than the complexity in practise for most practical high performance code.

You are far better off with linear access where possible (e.g. vectors, open addressing hash maps) or if you must have a tree, make the fan-out factor as large as possible (e.g. btrees rather than binary trees).

Now, I don't know if Haskell etc affords you such control, I mainly code in Rust (and C++ in the past).

Also see this old thread from 2016 on hacker news about this very topic: https://news.ycombinator.com/item?id=13263275

[–] marlinz@sueden.social 5 points 5 hours ago (1 children)

@Vorpal

Totally fair point, thanks for calling that out.

When I mentioned finger trees I was thinking more about the *functional* side (persistence, elegant composition, Haskell/Data.Sequence style usage) than raw performance on real hardware.

In performance‑critical code your argument for cache‑friendly linear structures and wide trees absolutely makes sense, and I appreciate the reminder to think about actual access patterns and hardware effects, not just asymptotic complexity.

[–] Vorpal@programming.dev 1 points 2 hours ago

I think a lot of modern software is bloated. I remember when GUI programs used to fit on a floppy or two. Nowdays we have bloated electron programs taking hundreds of MB of RAM just to show a simple text editor, because it drags a whole browser with it.

I love snappy software, and while I don't think we need to go back to programs fitting on a single floppy and using hundreds of KB of RAM, the pendulum does need to swing back a fair bit. I rewrote some CLI programs in the last few years that I found slow (one my own previously written in Python, the other written in C++ but not properly designed for speed). I used Rust, which sure helped compared to Python, but the real key was thinking carefully about the data structures used up front and designing for performance. And lots of profiling and benchmarking as I went along.

The results? The python program was sped up by 50x, the C++ program by 320x. In both cases it changed these from "irritating delay" to "functionally instant for human perception".

The two programs:

Python to Rust: https://github.com/VorpalBlade/chezmoi_modify_manager
Blog about rewriting the C++ program in Rust: https://vorpal.se/posts/2025/mar/25/filkoll-the-fastest-command-not-found-handler/

And I also rewrote a program I used to manage Arch Linux configs (written in bash) in Rust. I also added features I wanted so it was never directly comparable (and I don't have numbers), but it made "apply configs to system" take seconds instead of minutes, with several additional features as well. (https://github.com/VorpalBlade/paketkoll/tree/main/crates/konfigkoll)

Oh and want a faster way to check file integrity vs the package manager on your Linux distro? Did that too.

Now what was the point I was making again? Maybe I'm just sensitive to slow software. I disable all animations in GUIs after all, all those milliseconds of waiting adds up over the years. Computers are amazingly fast these days, we shouldn't make them slower than they have to be. So I think far more software should count as performance critical. Anything a human has to wait for should be.

Faster software is more efficient as well, using less electricity, making your phone/laptop battery last longer (since the CPU can go back to sleep sooner). And saves you money in the cloud. Imagine if you could save 30-50% on your cloud bill by renting fewer resources? Over the last few years I have seen multiple reports of this happening when companies rewrite in Rust (C++ would also do this, but why would you want to move to C++ these days?). And hyperscalers save millions in electricity by optimising their logging library by just a few percent.

Most modern software on modern CPUs is bottlenecked on memory bandwidth, so it makes sense to spend effort on data representation. Sure start with some basic profiling to find obvious stupid things (all non-trivial software that hasn't been optimised has stupid things), but once you exhausted that, you need to look at memory layout.

(My dayjob involves hard realtime embedded software. No, I swear that is unrelated to this.)

[–] coherent_domain 4 points 5 hours ago* (last edited 5 hours ago)

I don't know if Haskell etc affords you such control

You can have immutable arrary with vectors, but to mutate them you will need to wrap your action in a Monad. It even supports unboxed values.

https://hackage.haskell.org/package/vector

But I agree boxed default actually causes a lot of performance overhead in many high-level languages.

[–] litchralee@sh.itjust.works 12 points 13 hours ago (1 children)

IMO, circular buffers with two advancing pointers are an awesome data structure for high performance compute. They're used in virtualized network hardware (see virtio) and minimizing Linux syscalls (see io_uring). Each ring implements a single producer, single consumer queue, so two rings are usually used for bidirectional data transfer.

It's kinda obscure because the need for asynchronous-transfer queues doesn't show up that often unless dealing with hardware or crossing outside of a single CPU. But it's becoming relevant due to coprocessors (ie small ARM CPUs attached to a main CPU) that process offloaded requests and then quickly return the result when ready.

[–] xthexder@l.sw0.com 3 points 3 hours ago* (last edited 3 hours ago)

One cool trick that can be used with circular buffers is to use memory mapping to map the same block of memory to 2 consecutive virtual address blocks. That way you can read the entire contents of the buffer as if it was just a regular linear buffer with an offset.

[–] TeamAssimilation 9 points 13 hours ago (1 children)

Maybe not that obscure, but Joe Celko’s Nested Set Model gave me exactly what I needed when I learned of it: fast queries on seldom-changing hierarchical database records.

Updates are heavy, but the reads are incredibly light.

https://en.wikipedia.org/wiki/Nested_set_model

[–] notabot@piefed.social 2 points 9 hours ago

I came here to mention these too. One addition that can be helpful in large trees is to add a depth attribute to each node so that you can easily limit the depth of subtree you retrieve.

[–] tiredofsametab@fedia.io 6 points 13 hours ago (1 children)

Not necessarily obscure, but I don't think Tries get enough love.

Edit: I can't spell

[–] muzzle@lemmy.zip 2 points 11 hours ago (1 children)

Tris?

[–] catchy_name@feddit.it 4 points 11 hours ago (2 children)

They likely meant tries / prefix trees.

[–] muzzle@lemmy.zip 1 points 10 hours ago

Trie!

[–] tiredofsametab@fedia.io 1 points 11 hours ago (1 children)

Yep. I can't spell today

[–] BonkTheAnnoyed@lemmy.blahaj.zone 1 points 6 hours ago

That's really neat!

[–] Nomad 5 points 13 hours ago (1 children)

Merkle trees

[–] tyler@programming.dev 5 points 13 hours ago (2 children)

Aren’t these one of the first structures you learn about in any comp sci course? Still good to know but not sure it’s obscure.

[–] Nomad 1 points 2 hours ago

Old tech is more like it. Good basics but you wouldn't code in ASM must of the time even if you learned it.

[–] coherent_domain 2 points 5 hours ago

Not disagreeing with you, but I find it funny that this is the only data structure I have not heard of in this entire thread 🤣

[–] MonkderVierte@lemmy.zip 1 points 11 hours ago

Archivemount