A Fast Immutable Map in Go

A Fast Immutable Map in Go

Consider the following problem. You have a large set of strings, maybe millions. You need to map these strings to 8-byte integers (uint64). These integers are given to you.

If you are working in Go, the standard solution is to create a map. The construction is trivial, something like the following loop.

m := make(map[string]uint64, N)
for i, k := range keys {
    m[k] = values[i]
}
        

One downside is that the map may use over 50 bytes per entry.

In important scenarios, we might have the following conditions. The map is large (a million of entries or more), you do not need to modify it dynamically (it is immutable), and all queried keys are in the set. In such conditions, you can reduce the memory usage down to almost the size of the keys, so about 8 bytes per entry. One fast technique is the binary fuse filters.

I implemented it as a Go library called constmap that provides an immutable map from strings to uint64 values using binary fuse filters. This data structure is ideal when you have a fixed set of keys at construction time and need fast, memory-efficient lookups afterward. You can even construct the map once, save it to disk so you do not pay the cost of constructing the map each time you need it.

The usage is just as simple.

package main

import (
    "fmt"
    "log"

    "github.com/lemire/constmap"
)

func main() {
    keys := []string{"apple", "banana", "cherry"}
    values := []uint64{100, 200, 300}

    cm, err := constmap.New(keys, values)
    if err != nil {
        log.Fatal(err)
    }

    fmt.Println(cm.Map("banana")) // 200
}
        

The construction time is higher (as expected for any compact data structure), but lookups are optimized for speed. I ran benchmarks on my Apple M4 Max processor to compare constmap lookups against Go's built-in map[string]uint64. The test uses 1 million keys.

Article content

ConstMap is nearly 3 times faster than Go's standard map for lookups! And we reduced the memory usage by a factor of 6.

The ConstMap may not always be faster, but it should always use significantly less memory. If it can reside in CPU cache while the map cannot, then it will be significantly faster.

Source Code The implementation is available on GitHub: github.com/lemire/constmap.

Amazing!!! I had the exact use case a month back. I of course moved to Rust, moved the string lookups to the API boundary, while using integers internally for hot path lookups. I’ll check the concept of fuse filters

Like
Reply

Is that repo public? I’m getting a 404.

To view or add a comment, sign in

More articles by Daniel Lemire

  • House prices and fertility

    No, rising house prices are not the driver of sharp fertility declines. The evidence shows only modest, mixed effects…

  • You can beat the binary search

    We sometimes have to look for a value in a sorted array. The simplest algorithm consists in just going through the…

    7 Comments
  • The fastest way to match characters on ARM processors?

    Consider the following problem. Given a string, you must match all of the ASCII white-space characters (\t, \n, \r, and…

    1 Comment
  • A brief history of C/C++ programming languages

    Initially, we had languages like Fortran (1957), Pascal (1970), and C (1972). Fortran was designed for number crunching…

    10 Comments
  • Can your AI rewrite your code in assembly?

    Suppose you have several strings and you want to count the number of instances of the character ! in your strings. In…

    3 Comments
  • Prefix sums at tens of gigabytes per second with ARM NEON

    Suppose that you have a record of your sales per day. You might want to get a running record where, for each day, you…

    2 Comments
  • You can use newline characters in URLs

    We locate web content using special addresses called URLs. We are all familiar with addresses like https://google.

  • How fast do browsers correct UTF-16 strings?

    JavaScript represents strings using Unicode, like most programming languages today. Each character in a JavaScript…

    1 Comment
  • How bad can Python stop-the-world pauses get?

    When programming, we need to allocate memory, and then deallocate it. If you program in C, you get used to malloc/free…

    3 Comments
  • AI: Igniting the Spark to End Stagnation

    Much of the West has been economically stagnant. Countries like Canada have failed to improve their productivity and…

    2 Comments

Others also viewed

Explore content categories