Introduction and Overview of Extendible Hashing
– Extendible Hashing is a fast access method for dynamic files.
– It is a hash-based indexing technique.
– It allows for efficient insertion and retrieval of data.
– The hash function returns a string of bits.
– The first i bits of each string are used as indices to determine the location in the directory (hash table).
Key Insertion, Bucket Overflow, and Directory Organization
– Keys are inserted into the hash table based on their hashed values.
– If a bucket becomes full, it needs to be split.
– The local depth of a bucket determines the number of bits used for hashing.
– If the local depth is equal to the global depth, the directory needs to be doubled.
– If the local depth is less than the global depth, the bucket can be split without doubling the directory.
– The directory contains pointers to buckets.
– The global depth represents the number of bits used for hashing in the directory.
– The local depth represents the number of bits used for hashing in a bucket.
– After a bucket split, the local depth is incremented and used for redistributing the entries.
– The directory size is doubled when a bucket becomes full.
Example Implementation in Python
– The extendible hashing algorithm can be implemented in Python.
– The code uses the least significant bits for efficient table expansion.
– The directory is represented as a list of pages.
– Each page has a map of key-value pairs and a local depth.
– The get_page() function retrieves the page based on the hashed key.
– The put() function inserts a key-value pair into the appropriate page.
Advantages and Limitations of Extendible Hashing
– Extendible Hashing allows for efficient insertion and retrieval of data.
– It handles dynamic file sizes effectively.
– It provides a balanced distribution of keys across buckets.
– However, the depth cannot exceed the bit size of an integer.
– Doubling the directory or splitting a bucket may not allow entries to be rehashed to different buckets.
Performance and Comparison with Other Hashing Techniques
– Extendible hashing provides efficient search, insert, and delete operations.
– The directory structure allows for a balanced distribution of records.
– The number of disk accesses is minimized for most operations.
– The performance remains stable even with a large number of records.
– The space overhead of the directory structure is relatively small.
– Extendible hashing is more flexible than static hashing.
– It handles dynamic changes in the number of records effectively.
– Compared to linear hashing, extendible hashing has a simpler structure.
– Extendible hashing performs well in scenarios with frequent updates.
– Other hashing techniques may be more suitable for specific use cases.
Extendible hashing is a type of hash system which treats a hash as a bit string and uses a trie for bucket lookup. Because of the hierarchical nature of the system, re-hashing is an incremental operation (done one bucket at a time, as needed). This means that time-sensitive applications are less affected by table growth than by standard full-table rehashes.
Extendible hashing was described by Ronald Fagin in 1979. Practically all modern filesystems use either extendible hashing or B-trees. In particular, the Global File System, ZFS, and the SpadFS filesystem use extendible hashing.