Access the NEW Basecamp Support Portal

Extendible hashing

« Back to Glossary Index

Introduction and Overview of Extendible Hashing
– Extendible Hashing is a fast access method for dynamic files.
– It is a hash-based indexing technique.
– It allows for efficient insertion and retrieval of data.
– The hash function returns a string of bits.
– The first i bits of each string are used as indices to determine the location in the directory (hash table).

Key Insertion, Bucket Overflow, and Directory Organization
– Keys are inserted into the hash table based on their hashed values.
– If a bucket becomes full, it needs to be split.
– The local depth of a bucket determines the number of bits used for hashing.
– If the local depth is equal to the global depth, the directory needs to be doubled.
– If the local depth is less than the global depth, the bucket can be split without doubling the directory.
– The directory contains pointers to buckets.
– The global depth represents the number of bits used for hashing in the directory.
– The local depth represents the number of bits used for hashing in a bucket.
– After a bucket split, the local depth is incremented and used for redistributing the entries.
– The directory size is doubled when a bucket becomes full.

Example Implementation in Python
– The extendible hashing algorithm can be implemented in Python.
– The code uses the least significant bits for efficient table expansion.
– The directory is represented as a list of pages.
– Each page has a map of key-value pairs and a local depth.
– The get_page() function retrieves the page based on the hashed key.
– The put() function inserts a key-value pair into the appropriate page.

Advantages and Limitations of Extendible Hashing
– Extendible Hashing allows for efficient insertion and retrieval of data.
– It handles dynamic file sizes effectively.
– It provides a balanced distribution of keys across buckets.
– However, the depth cannot exceed the bit size of an integer.
– Doubling the directory or splitting a bucket may not allow entries to be rehashed to different buckets.

Performance and Comparison with Other Hashing Techniques
– Extendible hashing provides efficient search, insert, and delete operations.
– The directory structure allows for a balanced distribution of records.
– The number of disk accesses is minimized for most operations.
– The performance remains stable even with a large number of records.
– The space overhead of the directory structure is relatively small.
– Extendible hashing is more flexible than static hashing.
– It handles dynamic changes in the number of records effectively.
– Compared to linear hashing, extendible hashing has a simpler structure.
– Extendible hashing performs well in scenarios with frequent updates.
– Other hashing techniques may be more suitable for specific use cases.

Extendible hashing (Wikipedia)

Extendible hashing is a type of hash system which treats a hash as a bit string and uses a trie for bucket lookup. Because of the hierarchical nature of the system, re-hashing is an incremental operation (done one bucket at a time, as needed). This means that time-sensitive applications are less affected by table growth than by standard full-table rehashes.

Extendible hashing was described by Ronald Fagin in 1979. Practically all modern filesystems use either extendible hashing or B-trees. In particular, the Global File System, ZFS, and the SpadFS filesystem use extendible hashing.

« Back to Glossary Index

Request an article

Please let us know what you were looking for and our team will not only create the article but we'll also email you to let you know as soon as it's been published.
Most articles take 1-2 business days to research, write, and publish.
Content/Article Request Form

Submit your RFP

We can't wait to read about your project. Use the form below to submit your RFP!
Request for Proposal

Contact and Business Information

Provide details about how we can contact you and your business.

Quote Request Details

Provide some information about why you'd like a quote.