Skip to main content
Glossary Term

Hash table

History and Concepts - Hans Peter Luhn wrote an internal IBM memorandum that used hashing with chaining. - Gene Amdahl, Elaine M. McGraw, Nathaniel Rochester, and Arthur Samuel of IBM Research implemented hashing for the IBM 701 assembler. - A. D. Linh proposed open addressing with linear probing. - W. Wesley Peterson coined the term 'open addressing' in an article. - Arnold Dumey discussed the idea of using remainder modulo a prime as a hash function. - Load factor is defined as the number of entries occupied in the hash table divided by the number of buckets. - The performance of the hash table deteriorates as the load factor increases. - The software ensures that the load factor never exceeds a certain constant. - The hash table is resized or rehashed when the load factor reaches a certain maximum value. - Separate chaining hash tables have no fixed point beyond which resizing is absolutely needed. - A hash function maps the universe of keys to array indices or slots within the table. - Conventional implementations of hash functions are based on the integer universe assumption. - A perfect hash function is an injective function that maps each element to a unique value. - Perfect hash functions can be created if all the keys are known ahead of time. - Hash functions can be evaluated empirically using statistical tests. - Hashing schemes based on the integer universe assumption include hashing by division, multiplication, universal hashing, dynamic perfect hashing, and static perfect hashing. - Hashing by division is the commonly used scheme. - Hashing by multiplication uses a real-valued constant to compute the hash function. - The golden ratio is suggested by Donald Knuth for hashing by multiplication. - The choice of hash function depends on the desired distribution and table size. - Uniform distribution of hash values is a fundamental requirement. - Non-uniform distribution increases collisions and their resolution cost. - Uniformity can be evaluated using statistical tests like Pearsons chi-squared test. - The distribution needs to be uniform for table sizes that occur in the application. - Dynamic resizing with exact doubling and halving requires uniformity for power of two table sizes. Collision Resolution Techniques - Hashing uses a hash function to transform a search key into an array index. - The ideal case is that no two search keys hash to the same array index. - Collision resolution is the second part of the algorithm. - Two common methods for collision resolution are separate chaining and open addressing. - Collision resolution is necessary because it is impossible to guarantee no collisions for unseen data. - Separate chaining involves building a linked list for each search array index. - Collided items are chained together through a single linked list. - Chained-Hash-Insert, Chained-Hash-Search, and Chained-Hash-Delete are operations in separate chaining. - If elements are inserted in a total order, it speeds up unsuccessful searches. - Other data structures like self-balancing binary search trees can be used for separate chaining. - Open addressing stores every entry record in the bucket array itself. - Hash resolution is performed through probing. - Linear probing, quadratic probing, and double hashing are probe sequences used in open addressing. - Open addressing may be slower than separate chaining when the load factor approaches 1. - Linear probing can lead to better CPU cache utilization due to locality of references. - Separate chaining with linked lists may not be cache-conscious due to spatial locality. - Cache-conscious variants use a more cache-friendly dynamic array. - Linear probing in open addressing can utilize CPU cache better due to locality of references. - Caching and locality of reference can improve memory latency and reduce access time. - Array-based separate chaining is more performant than the standard linked list method under heavy load. - Coalesced hashing is a hybrid of separate chaining and open addressing. - Cuckoo hashing guarantees worst-case lookup complexity and constant amortized time for insertions. - Hopscotch hashing combines elements of cuckoo hashing, linear probing, and chaining. - Coalesced hashing is suited for fixed memory allocation. - Hopscotch hashing provides high throughput in concurrent settings and is suitable for resizable concurrent hash tables. Hash Table Implementation - Hash tables use a hash function to map keys to indices in an array. - They provide constant-time average-case performance for insertion, deletion, and search operations. - Hash collisions occur when two keys map to the same index. - Collision resolution techniques include chaining and open addressing. - Chaining uses linked lists to store multiple items at the same index. - Chaining resolves collisions by storing multiple items in linked lists at the same index. - Open addressing resolves collisions by finding alternative positions within the hash table. - Linear probing is a common open addressing technique that checks the next position in the table. - Quadratic probing uses a quadratic function to search for the next position. - Robin Hood hashing favors displacing items with longer probe sequence lengths. - Dynamic resizing is necessary when the number of entries in a hash table grows. - Resizing maintains the performance of lookup and insertion operations. - Items are rehashed into the buckets of the new hash table. - Resizing may be performed to avoid excessive memory usage. - Resizing can be computationally expensive, especially with all-at-once rehashing. Applications of Hash Tables - Hash tables are commonly used to implement associative arrays and in-memory tables. - They are used in database indexing, although B-trees are more popular in this context. - Hash tables can be used to implement caches, speeding up access to data stored in slower media. - They are also used in set data structures for testing membership of a value. - Transposition tables are complex hash tables that store information about searched sections. - Hash tables are widely used in databases and caching systems. - They provide efficient lookup operations in symbol tables and dictionaries. - Hash tables are used in spell checkers and spell correction algorithms. - They are employed in hash-based data structures like bloom filters and hash maps. - Cryptocurrencies and blockchain technologies utilize hash functions. Hash Table Implementations in Programming Languages - Many programming languages provide hash table functionality