Distributed Hash Tables
Updated on 2021-04-15Earlier ideas on distributed look-ups include
- central coordinator
- query flooding
- requests contain Time-To-Live
- DNS uses hierarchical lookups
Distributed Hash Tables
Algorithmic requirement: every node can find the answer.
Trade-off between the size of states, the volume of the maintenance traffic and the number of lookups.
Rendezvous (or highest random weight) hashing
Each key generates a randomly sorted list of nodes and chooses the first node from the list.
Concretely,
Hash all possible key-node combinations with a random hash function
Assign the key to the node with the largest hash value
Advantages
The load of failing server is evenly distributed across the remaining nodes, because each key potentially has a different second-choice node.
Each node only has to know the list of node identifiers and the hash function, hence it a requires smaller state size in comparison with consistent hashing.
Disadvantages
Adding new nodes is hard. (I did not really get why this is the case.)
If there are N nodes, the lookup algorithm takes O(N) because all of the key-server combinations have to be examined.
Consistent hashing
Consistent hashing requires only #-keys/#-nodes be remapped when a node arrives/leaves, based on (logical) circular/ring topology.
A node is assigned a random value in the hash space and to multiple points in the ring, a.k.a. virtual nodes. Virtual nodes help balance load distribution. To have good load balance, represent each node by log(N) virtual nodes. (Why log(N)?)
To create N replicas, store each key-value at N-1 successor nodes. A key is assigned to a coordinator node (via hashing) who is in charge of replication.
Having all nodes know about each other requires O(1) hops and sqrt(N) states. (Why sqrt(N)?) Instead, one can use finger tables. A finger table stores at the i-th entry the first node that succeeds or equal to n+2^i. Hence finger tables require O(log N) hops, because each hop expects to half the distance, and O(log N) states.
Other topologies than ring includes:
tree-like structures (Pastry, Tapestry, Kademlia)
hypercubes (CAN), maintain pointers to neighbors who differ in one bit position. Only one possible neighbor in each direction. Route to the receiver by changing any bit.
The ring geometry allows the greatest flexibility, and hence archives the best resilience and proximity performance. (Cf. The impact of DHT routing geometry on resilience and proximity)
Many services (like Google) are scaling to huge numbers without the log(N) techniques, but with direct routing, in which everyone knows full routing tables.
Amazon Dynamo
Dynamo offers always-writable data store (even during network partitions), based on the design principles of incremental scalability, decentralization, heterogenecity (mix of slow and fast systems).
Dynamo uses consistent-hashing.
It only guarantees weaker consistency. Not all updates may arrive at all replicas. Data is versioned, using vector clocks. If a node was unreachable, the replica is sent to another node in the ring. Metadata about the original desired destination is sent with the data. Periodically, a node checks if the original targeted node is alive. Conflicting changes have to be resolved during reads by applications
References
https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-852j-distributed-algorithms-fall-2009/
http://www.cs.cmu.edu/~dga/15-744/S07/lectures/16-dht.pdf
https://www.cs.rutgers.edu/~pxk/417/index.html
https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-033-computer-system-engineering-spring-2018/