Overview and Properties of Hash Functions
– A hash function takes a key as an input to identify data in a storage and retrieval application.
– The output is a hash code used to index a hash table.
– A good hash function should be fast to compute and minimize collisions.
– Uniformity in hash values is important for efficient performance.
Hash Tables and Collision Resolution
– Hash functions are used with hash tables to store and retrieve data items.
– Collision resolution is required when a hash code indexes a full slot.
– The collision resolution procedure depends on the structure of the hash table.
Specialized Uses of Hash Functions
– Hash functions are used to build caches for large data sets.
– They are used in the Bloom filter and geometric hashing.
– Hash tables are used to implement associative arrays and dynamic sets.
Testing, Measurement, and Efficiency of Hash Functions
– The uniformity of hash values can be evaluated using the chi-squared test.
– Hash functions should balance search time and data storage space.
– Computational complexity varies with different methods.
– Faster hash functions are preferred over ones with more computation but fewer collisions.
Applicability and Variations of Hash Functions
– Hash functions are applicable in integrity checks, key derivation, MACs, and password storage.
– Deterministic hash functions always generate the same hash value for a given input.
– Hash functions can have a fixed or variable range of hash values.
– Dynamic hash functions minimize record relocation.
– Data normalization is important for accurate comparison results.
– There are variations of hash functions such as algebraic coding, unique permutation hashing, and customized hash functions.
– Hash functions for variable-length data should consider all characters of the string.
This article needs additional citations for verification. (July 2010) |
A hash function is any function that can be used to map data of arbitrary size to fixed-size values, though there are some hash functions that support variable length output. The values returned by a hash function are called hash values, hash codes, hash digests, digests, or simply hashes. The values are usually used to index a fixed-size table called a hash table. Use of a hash function to index a hash table is called hashing or scatter storage addressing.
Hash functions and their associated hash tables are used in data storage and retrieval applications to access data in a small and nearly constant time per retrieval. They require an amount of storage space only fractionally greater than the total space required for the data or records themselves. Hashing is a computationally and storage space-efficient form of data access that avoids the non-constant access time of ordered and unordered lists and structured trees, and the often exponential storage requirements of direct access of state spaces of large or variable-length keys.
Use of hash functions relies on statistical properties of key and function interaction: worst-case behaviour is intolerably bad but rare, and average-case behaviour can be nearly optimal (minimal collision).
Hash functions are related to (and often confused with) checksums, check digits, fingerprints, lossy compression, randomization functions, error-correcting codes, and ciphers. Although the concepts overlap to some extent, each one has its own uses and requirements and is designed and optimized differently. The hash function differs from these concepts mainly in terms of data integrity.