Replication in databases (Leaderless Replication)
We discussed in Part 1 about Single Leader Replication, and in Part 3 multi-leader replication. In this article, we’ll focus on the third and our last replication algorithm, the Leaderless Replication.
The principle of leaderless-replication algorithm is that any replica of the database can take up the writes, thus abandoning the concept of a leader. In some implementations, the client directly sends its writes to multiple replicas in parallel, while in others, a coordinator node does this on behalf of the client but this coordinator does not enforce a particular ordering of writes as it is there in a leader database.
It’s worth remembering that some of the earliest replicated data systems were leaderless. Riak, Cassandra and Voldemort are open-source databases with leaderless replication models inspired by Dynamo, Amazon’s in-house database (Don’t confuse it with DynamoDB), and therefore they are also called Dynamo-style databases.
Let’s understand more about leaderless replication by taking some important cases.
Case 1: You know that a client sends a write to multiple replicas in parallel, but what if one or multiple nodes are unavailable?
- In leader-based replication, failover would have happened but here it will be handled in a different way.
- Imagine, we have three replicas, of which only two are available, so we will receive two successful acknowledgments, and that would be considered successful. The client simply ignores the fact that one of the replicas missed the write. This is how leaderless based application handles node outrage.
Case 2: Now, suppose when this third replica is available again and clients start reading from this node. This node doesn’t have any new writes and will give stale values so let’s see how Dynamo style database handles it.
- Similar to the way we send a write request, we send a read request to multiple nodes in parallel. Each response comes with a version number, and the latest version number value is considered the latest value.
Case 3: Even if the client gets the most recent value, how do we ensure that all replicas are up to date with the latest writes?
- In dynamo-style data stores, we use two mechanisms to facilitate this, and let’s understand these in depth.
- Read Repair — We have earlier seen that all replicas send their response with a version number. Considering the above case, let’s suppose that replicas 1 & 2 send version 5, while replica 3 sends version 4. The client detects the stale value of replica3 and writes the newer value back to it. It’s important to note that this approach works well for values that are frequently read.
- Anti-entropy process — Some data stores have a background process that constantly looks for differences in the data between replicas and thus, copies any missing data from one replica to another. It is important to note here, that this data copy of writes doesn’t follow any sequence as was done in leader-based applications. Also, it focuses on data that are rarely used as it may cause it to remain in a stale state.
Quorums :
In the case above, we saw that two out of three replicas acknowledged the client, and we arrived at write successful. So, what is the number of nodes that should accept the write for it to be successful ? Let’s understand it further:
n — total number of replicas of the database
w — total number of replicas that confirmed the write
r — total number of replicas to be queried for each read
Reads and writes that obey w and r values, such that w+r > n is called quorum reads and writes.
This ensures that we get the most up-to-date value from the database. In Dynamo-style databases, the values of n, w, and r are mostly configurable, and a common practice is to make n an odd number and set w=r=(n+1)/2. However, this can vary based on the fit.
The quorum condition allows the system to tolerate unavailable nodes as follows :
- If w < n, we can still process writes if a node is unavailable.
- If r< n, we can still process reads if a node is unavailable.
Limitation of Quorum Consistency :
- When the quorum condition is satisfied (w+r > n ) :
Quorum condition most recent values from the replica only if w and r overlap. This means, that if the nodes that accept writes and the nodes from which we read don’t overlap, we might get stale data. There are two particular scenarios that we can think of :
- If the sloppy quorum is used, the w writes may end on different nodes than the r reads, and there is no overlap guarantee.
- If two write occurs concurrently, it is not clear which happened first, the only solution is to merge the concurrent writes, with any of the conflict resolution methods discussed in Part 3.
- If a write succeeded on some replicas but failed on others, and overall succeeded on fewer than w replicas, it is not rolled back on replicas where it succeeded. This means that if a write was reported as failed, subsequent reads may or may not return the value from the write.
- If a node carrying a new value fails and its data is restored from a replica carrying an old value, the number of replicas storing the new value may fall below w, breaking the quorum condition.
One thing that is important to understand here is that w and r allow you to adjust the probability of getting the most recent value, but is not a guarantee. Also, since leaderless replication does not maintain a replication log or anything, it becomes difficult to monitor its staleness.
2. When the quorum condition is not satisfied (w+r ≤ n ) :
The writes and reads are still sent to n nodes, but a smaller number of successful responses is required for the operation to succeed. And with smaller w and r, you are most likely to read stale values.
Databases, with appropriately configured quorums, can tolerate the failure of individual nodes without the need for failover. Thus, by compromising the durability of the data, you are achieving “lower latency” and “higher availability” and you can continue processing reads and writes unless reachable replicas fall behind w and r and the database becomes unavailable for reading and writing.
SLOPPY QUORUM :
Imagine a scenario, such that, you ordered something from Amazon, and the delivery guy comes to your home to deliver your order, but you are not available at your home. In that case, the delivery guy hands over the parcel to the security guard and when you are back the security guard gives it back to you.
Now let’s understand the analogy. Imagine the client(delivery guy), is trying to reach a node for write(delivery), but because of some network interruptions, the node or replicas, are unavailable. But the client can connect to some other set of nodes, which are not in the designated set of nodes n. In this case, the client writes to the available w nodes and when the designated nodes are back online, these writes are copied back to these nodes. This copying of nodes is also called “hinted handoff”.
So sloppy quorum isn’t a quorum, it’s the only assurance of “durability” and “write availability”. Sloppy quorums are optional in all common Dynamo-style implementations.
MANAGING CONCURRENT WRITES :
The consistency, conflicts, and durability of databases using leaderless replication don’t end here. Alike multi-leader, conflict resolution become a major step towards achieving consistency in the database. Even “last write wins” have drawbacks such as inconsistent or skewed clock timings, causing loss of data. Also, the haphazard ordering of writes may leave the database in an inconsistent state. Merging concurrently written values or siblings is a major task in leaderless applications. Maintaining the version for each write and correspondingly for each replica(Version vectors)helps a lot in tracking concurrent writes. In simple words, with every write request that the client sends to a replica, it also sends the last version of the replica it had. This version in turn is then overwritten by the server and any other write version apart from this is treated as a concurrent write. There are a whole lot of things to discuss in managing conflicts and concurrent writes and you can go and read more about it.
This was all about leaderless replication and an end to the series of Replication. You can also read more about the other type of replications on my page.
Part 1: Single Leader Replication
Part 2: Replication logs and lags
Part 3: Multi-Leader Replication
Thanks for reading :)