Hash table

« Back to Glossary Index

History and Concepts
– Hans Peter Luhn wrote an internal IBM memorandum that used hashing with chaining.
– Gene Amdahl, Elaine M. McGraw, Nathaniel Rochester, and Arthur Samuel of IBM Research implemented hashing for the IBM 701 assembler.
– A. D. Linh proposed open addressing with linear probing.
– W. Wesley Peterson coined the term ‘open addressing’ in an article.
– Arnold Dumey discussed the idea of using remainder modulo a prime as a hash function.
– Load factor is defined as the number of entries occupied in the hash table divided by the number of buckets.
– The performance of the hash table deteriorates as the load factor increases.
– The software ensures that the load factor never exceeds a certain constant.
– The hash table is resized or rehashed when the load factor reaches a certain maximum value.
– Separate chaining hash tables have no fixed point beyond which resizing is absolutely needed.
– A hash function maps the universe of keys to array indices or slots within the table.
– Conventional implementations of hash functions are based on the integer universe assumption.
– A perfect hash function is an injective function that maps each element to a unique value.
– Perfect hash functions can be created if all the keys are known ahead of time.
– Hash functions can be evaluated empirically using statistical tests.
– Hashing schemes based on the integer universe assumption include hashing by division, multiplication, universal hashing, dynamic perfect hashing, and static perfect hashing.
– Hashing by division is the commonly used scheme.
– Hashing by multiplication uses a real-valued constant to compute the hash function.
– The golden ratio is suggested by Donald Knuth for hashing by multiplication.
– The choice of hash function depends on the desired distribution and table size.
– Uniform distribution of hash values is a fundamental requirement.
– Non-uniform distribution increases collisions and their resolution cost.
– Uniformity can be evaluated using statistical tests like Pearsons chi-squared test.
– The distribution needs to be uniform for table sizes that occur in the application.
– Dynamic resizing with exact doubling and halving requires uniformity for power of two table sizes.

Collision Resolution Techniques
– Hashing uses a hash function to transform a search key into an array index.
– The ideal case is that no two search keys hash to the same array index.
– Collision resolution is the second part of the algorithm.
– Two common methods for collision resolution are separate chaining and open addressing.
– Collision resolution is necessary because it is impossible to guarantee no collisions for unseen data.
– Separate chaining involves building a linked list for each search array index.
– Collided items are chained together through a single linked list.
– Chained-Hash-Insert, Chained-Hash-Search, and Chained-Hash-Delete are operations in separate chaining.
– If elements are inserted in a total order, it speeds up unsuccessful searches.
– Other data structures like self-balancing binary search trees can be used for separate chaining.
– Open addressing stores every entry record in the bucket array itself.
– Hash resolution is performed through probing.
– Linear probing, quadratic probing, and double hashing are probe sequences used in open addressing.
– Open addressing may be slower than separate chaining when the load factor approaches 1.
– Linear probing can lead to better CPU cache utilization due to locality of references.
– Separate chaining with linked lists may not be cache-conscious due to spatial locality.
– Cache-conscious variants use a more cache-friendly dynamic array.
– Linear probing in open addressing can utilize CPU cache better due to locality of references.
– Caching and locality of reference can improve memory latency and reduce access time.
– Array-based separate chaining is more performant than the standard linked list method under heavy load.
– Coalesced hashing is a hybrid of separate chaining and open addressing.
– Cuckoo hashing guarantees worst-case lookup complexity and constant amortized time for insertions.
– Hopscotch hashing combines elements of cuckoo hashing, linear probing, and chaining.
– Coalesced hashing is suited for fixed memory allocation.
– Hopscotch hashing provides high throughput in concurrent settings and is suitable for resizable concurrent hash tables.

Hash Table Implementation
– Hash tables use a hash function to map keys to indices in an array.
– They provide constant-time average-case performance for insertion, deletion, and search operations.
– Hash collisions occur when two keys map to the same index.
– Collision resolution techniques include chaining and open addressing.
– Chaining uses linked lists to store multiple items at the same index.
– Chaining resolves collisions by storing multiple items in linked lists at the same index.
– Open addressing resolves collisions by finding alternative positions within the hash table.
– Linear probing is a common open addressing technique that checks the next position in the table.
– Quadratic probing uses a quadratic function to search for the next position.
– Robin Hood hashing favors displacing items with longer probe sequence lengths.
– Dynamic resizing is necessary when the number of entries in a hash table grows.
– Resizing maintains the performance of lookup and insertion operations.
– Items are rehashed into the buckets of the new hash table.
– Resizing may be performed to avoid excessive memory usage.
– Resizing can be computationally expensive, especially with all-at-once rehashing.

Applications of Hash Tables
– Hash tables are commonly used to implement associative arrays and in-memory tables.
– They are used in database indexing, although B-trees are more popular in this context.
– Hash tables can be used to implement caches, speeding up access to data stored in slower media.
– They are also used in set data structures for testing membership of a value.
– Transposition tables are complex hash tables that store information about searched sections.
– Hash tables are widely used in databases and caching systems.
– They provide efficient lookup operations in symbol tables and dictionaries.
– Hash tables are used in spell checkers and spell correction algorithms.
– They are employed in hash-based data structures like bloom filters and hash maps.
– Cryptocurrencies and blockchain technologies utilize hash functions.

Hash Table Implementations in Programming Languages
– Many programming languages provide hash table functionality

Hash table (Wikipedia)

In computing, a hash table, also known as a hash map, is a data structure that implements an associative array, also called a dictionary, which is an abstract data type that maps keys to values. A hash table uses a hash function to compute an index, also called a hash code, into an array of buckets or slots, from which the desired value can be found. During lookup, the key is hashed and the resulting hash indicates where the corresponding value is stored.

Hash table
TypeUnordered associative array
Invented1953
Time complexity in big O notation
Operation Average Worst case
Search Θ(1) O(n)
Insert Θ(1) O(n)
Delete Θ(1) O(n)
Space complexity
Space Θ(n) O(n)
A small phone book as a hash table

Ideally, the hash function will assign each key to a unique bucket, but most hash table designs employ an imperfect hash function, which might cause hash collisions where the hash function generates the same index for more than one key. Such collisions are typically accommodated in some way.

In a well-dimensioned hash table, the average time complexity for each lookup is independent of the number of elements stored in the table. Many hash table designs also allow arbitrary insertions and deletions of key–value pairs, at amortized constant average cost per operation.

Hashing is an example of a space-time tradeoff. If memory is infinite, the entire key can be used directly as an index to locate its value with a single memory access. On the other hand, if infinite time is available, values can be stored without regard for their keys, and a binary search or linear search can be used to retrieve the element.

In many situations, hash tables turn out to be on average more efficient than search trees or any other table lookup structure. For this reason, they are widely used in many kinds of computer software, particularly for associative arrays, database indexing, caches, and sets.

« Back to Glossary Index

Submit your RFP

We can't wait to read about your project. Use the form below to submit your RFP!

Gabrielle Buff
Gabrielle Buff

Just left us a 5 star review

Great customer service and was able to walk us through the various options available to us in a way that made sense. Would definitely recommend!

Stoute Web Solutions has been a valuable resource for our business. Their attention to detail, expertise, and willingness to help at a moment's notice make them an essential support system for us.

Paul and the team are very professional, courteous, and efficient. They always respond immediately even to my minute concerns. Also, their SEO consultation is superb. These are good people!

Paul Stoute & his team are top notch! You will not find a more honest, hard working group whose focus is the success of your business. If you’re ready to work with the best to create the best for your business, go Stoute Web Solutions; you’ll definitely be glad you did!

Wonderful people that understand our needs and make it happen!

Paul is the absolute best! Always there with solutions in high pressure situations. A steady hand; always there when needed; I would recommend Paul to anyone!

facebook
Vince Fogliani
recommends

The team over at Stoute web solutions set my business up with a fantastic new website, could not be happier

facebook
Steve Sacre
recommends

If You are looking for Website design & creativity look no further. Paul & his team are the epitome of excellence.Don't take my word just refer to my website "stevestours.net"that Stoute Web Solutions created.This should convince anyone that You have finally found Your perfect fit

facebook
Jamie Hill
recommends

Paul and the team at Stoute Web are amazing. They are super fast to answer questions. Super easy to work with, and knows their stuff. 10,000 stars.

Paul and the team from Stoute Web solutions are awesome to work with. They're super intuitive on what best suits your needs and the end product is even better. We will be using them exclusively for our web design and hosting.

facebook
Dean Eardley
recommends

Beautifully functional websites from professional, knowledgeable team.

Along with hosting most of my url's Paul's business has helped me with website development, graphic design and even a really cool back end database app! I highly recommend him as your 360 solution to making your business more visible in today's social media driven marketplace.

I hate dealing with domain/site hosts. After terrible service for over a decade from Dreamhost, I was desperate to find a new one. I was lucky enough to win...

Paul Stoute has been extremely helpful in helping me choose the best package to suite my needs. Any time I had a technical issue he was there to help me through it. Superb customer service at a great value. I would recommend his services to anyone that wants a hassle free and quality experience for their website needs.

Paul is the BEST! I am a current customer and happy to say he has never let me down. Always responds quickly and if he cant fix the issue right away, if available, he provides you a temporary work around while researching the correct fix! Thanks for being an honest and great company!!

Paul Stoute is absolutely wonderful. Paul always responds to my calls and emails right away. He is truly the backbone of my business. From my fantastic website to popping right up on Google when people search for me and designing my business cards, Paul has been there every step of the way. I would recommend this company to anyone.

I can't say enough great things about Green Tie Hosting. Paul was wonderful in helping me get my website up and running quickly. I have stayed with Green...