Glossary Term
Hash table
History and Concepts
- Hans Peter Luhn wrote an internal IBM memorandum that used hashing with chaining.
- Gene Amdahl, Elaine M. McGraw, Nathaniel Rochester, and Arthur Samuel of IBM Research implemented hashing for the IBM 701 assembler.
- A. D. Linh proposed open addressing with linear probing.
- W. Wesley Peterson coined the term 'open addressing' in an article.
- Arnold Dumey discussed the idea of using remainder modulo a prime as a hash function.
- Load factor is defined as the number of entries occupied in the hash table divided by the number of buckets.
- The performance of the hash table deteriorates as the load factor increases.
- The software ensures that the load factor never exceeds a certain constant.
- The hash table is resized or rehashed when the load factor reaches a certain maximum value.
- Separate chaining hash tables have no fixed point beyond which resizing is absolutely needed.
- A hash function maps the universe of keys to array indices or slots within the table.
- Conventional implementations of hash functions are based on the integer universe assumption.
- A perfect hash function is an injective function that maps each element to a unique value.
- Perfect hash functions can be created if all the keys are known ahead of time.
- Hash functions can be evaluated empirically using statistical tests.
- Hashing schemes based on the integer universe assumption include hashing by division, multiplication, universal hashing, dynamic perfect hashing, and static perfect hashing.
- Hashing by division is the commonly used scheme.
- Hashing by multiplication uses a real-valued constant to compute the hash function.
- The golden ratio is suggested by Donald Knuth for hashing by multiplication.
- The choice of hash function depends on the desired distribution and table size.
- Uniform distribution of hash values is a fundamental requirement.
- Non-uniform distribution increases collisions and their resolution cost.
- Uniformity can be evaluated using statistical tests like Pearsons chi-squared test.
- The distribution needs to be uniform for table sizes that occur in the application.
- Dynamic resizing with exact doubling and halving requires uniformity for power of two table sizes.
Collision Resolution Techniques
- Hashing uses a hash function to transform a search key into an array index.
- The ideal case is that no two search keys hash to the same array index.
- Collision resolution is the second part of the algorithm.
- Two common methods for collision resolution are separate chaining and open addressing.
- Collision resolution is necessary because it is impossible to guarantee no collisions for unseen data.
- Separate chaining involves building a linked list for each search array index.
- Collided items are chained together through a single linked list.
- Chained-Hash-Insert, Chained-Hash-Search, and Chained-Hash-Delete are operations in separate chaining.
- If elements are inserted in a total order, it speeds up unsuccessful searches.
- Other data structures like self-balancing binary search trees can be used for separate chaining.
- Open addressing stores every entry record in the bucket array itself.
- Hash resolution is performed through probing.
- Linear probing, quadratic probing, and double hashing are probe sequences used in open addressing.
- Open addressing may be slower than separate chaining when the load factor approaches 1.
- Linear probing can lead to better CPU cache utilization due to locality of references.
- Separate chaining with linked lists may not be cache-conscious due to spatial locality.
- Cache-conscious variants use a more cache-friendly dynamic array.
- Linear probing in open addressing can utilize CPU cache better due to locality of references.
- Caching and locality of reference can improve memory latency and reduce access time.
- Array-based separate chaining is more performant than the standard linked list method under heavy load.
- Coalesced hashing is a hybrid of separate chaining and open addressing.
- Cuckoo hashing guarantees worst-case lookup complexity and constant amortized time for insertions.
- Hopscotch hashing combines elements of cuckoo hashing, linear probing, and chaining.
- Coalesced hashing is suited for fixed memory allocation.
- Hopscotch hashing provides high throughput in concurrent settings and is suitable for resizable concurrent hash tables.
Hash Table Implementation
- Hash tables use a hash function to map keys to indices in an array.
- They provide constant-time average-case performance for insertion, deletion, and search operations.
- Hash collisions occur when two keys map to the same index.
- Collision resolution techniques include chaining and open addressing.
- Chaining uses linked lists to store multiple items at the same index.
- Chaining resolves collisions by storing multiple items in linked lists at the same index.
- Open addressing resolves collisions by finding alternative positions within the hash table.
- Linear probing is a common open addressing technique that checks the next position in the table.
- Quadratic probing uses a quadratic function to search for the next position.
- Robin Hood hashing favors displacing items with longer probe sequence lengths.
- Dynamic resizing is necessary when the number of entries in a hash table grows.
- Resizing maintains the performance of lookup and insertion operations.
- Items are rehashed into the buckets of the new hash table.
- Resizing may be performed to avoid excessive memory usage.
- Resizing can be computationally expensive, especially with all-at-once rehashing.
Applications of Hash Tables
- Hash tables are commonly used to implement associative arrays and in-memory tables.
- They are used in database indexing, although B-trees are more popular in this context.
- Hash tables can be used to implement caches, speeding up access to data stored in slower media.
- They are also used in set data structures for testing membership of a value.
- Transposition tables are complex hash tables that store information about searched sections.
- Hash tables are widely used in databases and caching systems.
- They provide efficient lookup operations in symbol tables and dictionaries.
- Hash tables are used in spell checkers and spell correction algorithms.
- They are employed in hash-based data structures like bloom filters and hash maps.
- Cryptocurrencies and blockchain technologies utilize hash functions.
Hash Table Implementations in Programming Languages
- Many programming languages provide hash table functionality