History and Properties of Distributed Hash Tables
– DHT research was motivated by peer-to-peer systems like Freenet, Gnutella, BitTorrent, and Napster.
– Napster used a central index server, making it vulnerable to attacks and lawsuits.
– Gnutella used a query flooding model, which was less efficient than Napster.
– Freenet employed key-based routing, clustering similar files on the same set of nodes.
– DHTs combined the decentralization of Freenet and Gnutella with the efficiency of Napster.
– Autonomy and decentralization are key characteristics of DHTs.
– DHTs should be fault-tolerant, handling continuous node arrivals, departures, and failures.
– Scalability is important, allowing efficient operation with thousands or millions of nodes.
– DHTs minimize coordination between nodes, reducing the work required for membership changes.
– Some DHT designs focus on security and anonymity, although less common than in other P2P systems.
Structure and Keyspace Partitioning in Distributed Hash Tables
– DHTs consist of an abstract keyspace, such as a set of 160-bit strings.
– Keyspace partitioning schemes distribute ownership of the keyspace among nodes.
– Overlay networks connect the nodes and enable key lookup.
– To store a file, its filename is hashed to generate a key, which is sent to a participating node.
– Retrieving a file involves hashing the filename to find the node responsible for the key.
– Most DHTs use consistent hashing or rendezvous hashing to map keys to nodes.
– Consistent hashing and rendezvous hashing minimize remapping of keys during node changes.
– In consistent hashing, the circular keyspace is divided into contiguous segments owned by nodes.
– Rendezvous hashing assigns keys based on the highest random weight.
– Both algorithms ensure that adding or removing a node affects only adjacent keys.
Applications of Distributed Hash Tables
– DHTs serve as infrastructure for various services like anycast, distributed file systems, and domain name services.
– Notable networks using DHTs include BitTorrent’s distributed tracker, the Kad network, and Freenet.
– DHT technology is adopted by BitTorrent and the Coral Content Distribution Network.
– DHTs enable peer-to-peer file sharing and content distribution systems.
– DHTs can be used for instant messaging, multicast, and building search engines like YaCy.
Overlay Networks and Algorithms in Distributed Hash Tables
– Each node maintains a set of links to other nodes (neighbors or routing table).
– These links form the overlay network.
– Nodes pick neighbors based on the network’s topology.
– All DHT topologies ensure that for any key k, each node either owns k or has a link to a node closer to k.
– Key-based routing is used to forward messages to the owner of any key k.
– Aside from routing, algorithms exploit the overlay network structure for message dissemination.
– Structella implements flooding and random walks on a Pastry overlay.
– DQ-DHT implements a dynamic querying search algorithm over a Chord network.
– These algorithms are used for overlay multicast, range queries, and collecting statistics.
– They leverage the structure of the overlay network to optimize message distribution.
Security and Techniques in Distributed Hash Tables
– DHTs are inherently more resilient against hostile attackers compared to centralized systems.
– Byzantine fault tolerance can defend against the Sybil attack in DHTs.
– Whanau is a DHT designed to be resistant to Sybil attacks.
– Incorporating social trust relationships into the system design can circumvent the Sybil attack.
– Effective defenses against Sybil attacks are an ongoing research topic in security conferences.
– DHT security techniques surveyed by Urdaneta, Pierre, and van Steen.
– Novel architectures for P2P applications: the Continuous-Discrete Approach.
– Dipsea: A Modular Distributed Hash Table.
– Structured overlay for heterogeneous environments.
– Self-Chord: A Bio-Inspired P2P Framework for Self-Organizing Distributed Systems.
This article needs additional citations for verification. (September 2020) |
A distributed hash table (DHT) is a distributed system that provides a lookup service similar to a hash table. Key–value pairs are stored in a DHT, and any participating node can efficiently retrieve the value associated with a given key. The main advantage of a DHT is that nodes can be added or removed with minimum work around re-distributing keys. Keys are unique identifiers which map to particular values, which in turn can be anything from addresses, to documents, to arbitrary data. Responsibility for maintaining the mapping from keys to values is distributed among the nodes, in such a way that a change in the set of participants causes a minimal amount of disruption. This allows a DHT to scale to extremely large numbers of nodes and to handle continual node arrivals, departures, and failures.
DHTs form an infrastructure that can be used to build more complex services, such as anycast, cooperative web caching, distributed file systems, domain name services, instant messaging, multicast, and also peer-to-peer file sharing and content distribution systems. Notable distributed networks that use DHTs include BitTorrent's distributed tracker, the Kad network, the Storm botnet, the Tox instant messenger, Freenet, the YaCy search engine, and the InterPlanetary File System.