The F14 hash tables outdo Facebook’s previous specialized implementations while avoiding their pathologies
According to HackerEarth, hashing is a technique used by developers to uniquely identify a specific object from a group of similar objects. In hashing, large keys are converted into small keys by using hash functions. The values are then stored in a data structure called hash table.
Hash tables provide a fast way to maintain a set of keys or map keys to values, even if the keys are objects, like strings.
To simplify the process of selecting the right hash table, Facebook has built and open-sourced F14, a 14-way probing hash table within Folly, its open source library of C++ components.
In a blog post, Facebook developers Nathan Bronson and Xiao Shi said, “The F14 hash tables outdo our previous specialized implementations while avoiding their pathologies. F14 is a good default — usually a great choice and never a bad one, regardless of the use case.”
It could be hard for developers sing hash tables to make the best choice when there are so many factors to consider. Acknowledging this issue, Facebook have condensed this list to one simple choice with F14. It says-
If you don’t keep long-lived references to entries, start with folly::F14FastMap/Set. Otherwise, start with folly::F14NodeMap/Set. F14 is part of Folly, Facebook’s open source library of C++ components.
“We can’t improve on the theory of hash tables, but we can improve on the practice,” asserted Nathan Bronson and Xiao Shi.
F14 for faster, more memory-efficient hash tables
In particular, the Facebook developers said, F14 provides practical improvements for both performance and memory.
- F14 improves hash table implementation by exploiting the vector instructions available on modern CPUs, providing multiple memory layouts, and paying careful attention to detail.
- It is easy to use well, integrates with testing tools, and enables advanced features by default.
- The core idea of F14 is to use the hash code to map keys to a chunk (a block of slots) instead of to a single slot, then search within the chunk in parallel.
- F14 performs collision resolution if a chunk overflows or if two keys both pass the filtering step.
- F14 reduces the need for object construction and copies both inside the hash table and in the surrounding code.
- F14 also has ASAN integration that can probabilistically detect reference and iterator stability issues.
F14 is widely used inside Facebook and it works well.
“F14 works well because its core algorithm leverages vector instructions to increase the load factor while reducing collisions, because it supports multiple memory layouts for different scenarios, and because we have paid attention to C++ overheads near the API. F14 is a good default choice — it delivers CPU and RAM efficiency that is robust across a wide variety of use cases,” the developers concluded.