Replication in Databases (Single Leader Replication)
Replication in databases is a very important mechanism by which the systems maintain the same copies of the database on various machines. There can be multiple reasons to replicate data like :
- Reduce latency
- Increase availability ( even if one or more nodes fail)
- Increase read throughput
Now, if the data you are replicating does not change over time, then replication is quite simple, just copying the data. But that’s not the case in most complex, real-time applications where millions of users are doing concurrent reads and writes.
Thus, to replicate data on multiple nodes or datacenter, multiple algorithms are used. The most used algorithms are :
- Single-leader
- Multi-leader
- Leaderless
Apart from these three, we have two major ways of replication on which we apply the above algorithms (In relational databases, this can be configured but in other systems, it might be hardcoded):
- Synchronous: The client gets “write successful” once all the writes are done.
- Asynchronous: The client gets “write successful” just after leader gets all the writes, even though the replication to all the nodes hasn’t been done.
- Semi-synchronous: A mixture of both sync and async when there is one leader-follower pair that is in sync and the rest others are async. So “write successful” is sent once data is replicated in one of that follower.
In this article, we’ll focus on a single leader and will discuss the other two algorithms in different posts.
SINGLE LEADER (Master-slave replication)
As we can see from the diagram, there’s a leader replica(where data is stored) and there are followers(slaves, read replicas, or hot-standby). The idea is simple :
- One of the nodes is designated as the leader and every write(update or insert queries) must go through it. The leader writes this new data to its local storage.
- When the leader writes data to its local storage, it also sends it to its followers as well as part of “replication-logs” or “change-stream”.
- The secondary replicas or the followers, in turn, take the log from the leader and update their local storage by applying all writes in the same order as were processed by the leader.
- For reads, the client can request from either the leader or the follower.
This method of replication is found in both relational and non-relational databases :
Relational : Postgres( version 9.0 and above), MySQL, Oracle Data Guard etc.
Non-Relational : MongoDB, RethinkDB, Espresso.
Even message brokers, like Kafka and RabbitMQ, use these methods of replication.
But it isn’t as simple as it looks, various incidences follow it, and let’s have a look over all these :
- Setting Up New Replicas: Increase the replicas or replace failed nodes. Best possible approaches :
— Take consistent snapshots of the leader database, copy them to the follower node, and then request all the data changes that happened after the particular snapshot and the follower catch up.
2. Handling Node Outrages :
Node outrage can happen anytime, while rebooting the machine or while installing a kernel security patch or any other thing. Typically, we segregate any node outrage into two categories :
- Follower failure (Catch-up recovery): Now, each follower must have it on their local disk, so when the crashed follower restarts, it can fetch the last processed transaction from the logs, and then request the leader for the changes thereafter.
- Leader failure (Failover): When a leader fails, there is much more of a chaotic environment. Generally, we use some kind of a heart-beat to know if the leader is alive and if it’s found dead, the following things happen: Choosing a new leader — (through election or pre-elected “controller node”), Reconfiguring the system — Clients need to send their writes to the new leader and followers need to start consuming changes from the new leader.
3. Leader failover in different scenarios:
Leader failover is a much larger problem and on a greater granularity, we can see how fraught it is :
- In the case of asynchronous replication, it may happen that the new leader may have not been updated with all the writes.
- Discarding writes would surely hamper the client’s durability and consistency of the data.
- It may happen that the two nodes simultaneously start thinking they are “leaders”, a situation known as “split-brain”. In this case, they both might accept writes. And thus, conflict resolution becomes much more important in such scenarios.
These were some of the topics pertaining to replication in general and single-leader replication in particular. There are pros and con of each algorithm and things become dynamic and complex and often there are trade-offs in durability, consistency, availability, and latency of a distributed system.
You can also read more about the other type of replications :
Part 2: Replication logs and lags
Part 3: Multi-Leader Replication
Part 4: Leaderless Replication
Thanks for reading : )