Generations-based array

Intro

I was intrigued by the sparse map/set described in Russ Cox’s article.

And I’m not the only one: this exact implementation is used in Go source code more than once! The compiler uses it for many ID-like maps and sets; regexp package uses it for a queue.

But there is one thing that is still bugging me: it’s hard to make it very efficient. All operations I care about are O(1), but get and set operations clearly become slower in comparison with a straightforward slice approach.

In fact, if your arrays are not that big (less than 0xffff bytes?), you might be better off using a slice with O(n) clear operation. If you do many get+set, the increased overhead may be too much.

In this article, I’ll propose a different data structure that can replace a sparse-dense map (and set) if you don’t need the iteration over the elements.

This discussion is not Go-specific, but I’ll use Go in the examples.

The Problem

Let me start with a problem that we’re trying to address.

Imagine that you need a mapping structure that you can re-use. Something like a map[uint16]T, but with a more predictable allocations pattern.

Your function may look like this:

func doWork(s *state) result {
    s.map.Reset() // You want the memory to be re-used

    // Do the work using the map.
    // Only get+set operations are used here.
}

If your “map” can re-use the memory properly, this code may become zero-alloc.

Our requirements can be described as follows:

Operation	Complexity
Set	O(1)
Get	O(1)
Reset	O(1)

We want it all, plus the efficient memory re-use.

We’ll analyze these choices today:

map[uint16]T
[]T
sparseMap
genMap

The slice and map solutions do not fit our requirements, but we’ll use them for a comparison.

Benchmark Results

Let’s start by comparing the raw performance.

Data Structure	Set	Get	Reset
map	(x17.9) 47802	(x28.6) 36922	1801
slice	2665	1289	6450
sparse	(x6.7) 17859	(x1.89) 2435	16
generations	(x1.1) 3068	(x1.04) 1349	26

Observations:

Map is heavily outclassed
Both sparse and generation maps have a crazy-fast reset
Even with 5000 elements (8*5000=40000 bytes), a slice reset takes noticeable time
sparse.set() operation is ~7 times slower than slice!
sparse.get() operation is ~2 times slower than slice
Generations map is almost as fast as a slice, but reset is much faster

The sparse and generations map do not zero their data during the reset operation. Therefore, avoid storing pointers in there. These pointers will be “held” by the container for a potentially long period of time, causing memory leaks. I would only recommend using both sparse and generations-based data structures with simple pointer-free.

You can find the exact benchmarks code here.

Some benchmark notes:

I used a real-world sparse-dense implementation
Every get/set goes through a noinline wrapper to avoid the unwanted optimizations
Every get/set test runs the operation 5000 times
Every benchmark is using 5000 elements (it’s important for slice reset)
The measurements above are divided by 10 for an easier interpretation
The value type used is int (8 bytes on my x86-64 machine)

Now, you should be cautious about random benchmarks posted on the internet. But no matter how you write and/or run these, generations map will always be faster than a sparse-dense map (or set). It’s almost as fast as a slice solution while still having a very fast O(1) reset.

There are reasons for it to be faster. Let’s talk about them.

Sparse Map Issues

Why sparse.set() operation is so slow?

When it comes to insertion of a new value, the sparse map has to do two memory writes. One for the sparse and one for the dense. Updating the existing value only writes to dense.

func (s *sparseMap[T]) Set(k int32, v T) {
	i := s.sparse[k]
	if i < int32(len(s.dense)) && s.dense[i].key == k {
		s.dense[i].val = v
		return
	}
	s.dense = append(s.dense, sparseEntry[T]{k, v})
	s.sparse[k] = int32(len(s.dense)) - 1
}

Another issue is that two slices mean twice as much boundchecks that can occur. And while you can be careful and use uint keys and check for the bounds yourself to stop compiler from generating an implicit boundcheck, you’ll still pay for these if statements.

The sparse.get() operation also suffers from a double memory read.

Generations Map

It’s possible to use some of the ideas behind the sparse-dense map and create an even more specialized data structure.

type genMapElem[T any] struct {
	seq uint32
	val T
}

type genMap[T any] struct {
	elems []genMapElem[T]
	seq   uint32
}

func newGenMap[T any](n int) *genMap[T] {
	return &genMap[T]{
		elems: make([]genMapElem[T], n),
		seq:   1,
	}
}

Every element will have a generation counter (seq). The container itself will have its own counter. The container’s counter starts with 1, while elements start with 0.

Both get and set operations look very similar to the slice version, but with a seq check.

func (m *genMap[T]) Set(k uint, v T) {
	if k < uint(len(m.elems)) {
		m.elems[k] = genMapElem[T]{val: v, seq: m.seq}
	}
}

Setting the element means updating the element’s counter to the container’s counter along with the value.

func (m *genMap[T]) Get(k uint) T {
	if k < uint(len(m.elems)) {
		el := m.elems[k]
		if el.seq == m.seq {
			return el.val
		}
	}
	var zero T
	return zero
}

If seq of the element is identical to the container’s counter, then this element is defined. Otherwise, it doesn’t matter what are the contents of this element.

You can probably already guess how Reset will look like:

func (m *genMap[T]) Reset() {
	m.seq++
}

Well, this is good enough for the most use cases, but there is a small chance that our uint32 will overflow, making some undefined elements defined. Increasing the seq size to uint64 could help, but it will increase the per-element size overhead. Instead, we can do a real clear operation once in MaxUint32 resets.

func (m *genMap[T]) Reset() {
	if m.seq == math.MaxUint32 {
		m.seq = 1
		clear(m.elems)
	} else {
		m.seq++
	}
}

It’s definitely possible to use uint8 or uint16 for the seq field. That would mean less per-element size overhead at the price of a more frequent data clear.

The generations map does exactly 1 memory read and write
It’s easier to get rid of all implicit boundchecks
Its memory consumption is comparable to the sparse-dense array
The Reset complexity is constant (amortized)
Arguably, it’s even easier to implement and understand than a sparse-dense map

It’s possible to make a generations-based set too. The get operation can be turned into contains with ease. With sets, only the counters are needed.

func (m *genSet) Contains(k uint) bool {
	if k < uint(len(m.counters)) {
		return m.counters[k] == m.seq
	}
	return false
}

It may sound fascinating, right? Well, you can’t use this data structure as a drop-in replacement for a sparse-dense. For instance, a generations-based map can’t be iterated efficiently.

You can add a length counter if you really need it, but that will add some extra overhead to the set operation. I would advise you not to do so. The main reason this structure can be so fast is its simplicity.

The average memory usage will be higher, since a referenced sparse-dense implementation doesn’t allocate n elements for the dense right away; it only allocates the entire sparse storage. So, if you only ever populate the array up to n/2, the approximate size usage would be 1.5n instead of a worst-case 2n scenario. The generations-based array would require the entire slice to be allocated right away, leading to a 2n memory usage scenario.

Conclusion

I used this data structure in my pathfinding library for Go. The results were great: 5-8% speedup just for a simple data structure change. Keep in mind that this library is already heavily optimized, so every couple of percentages count.

In turn, this pathfinding library was used in a game I released on Steam: Roboden.

Therefore, I would consider this data structure to be production-ready.

quasilyte blog

Technical blog about systems programming and related topics