Replication Logs and Lags
We discussed in Part-1, how replication logs are sent from the leader to the follower over the network, to catch up with the leader’s latest writes and stay up to date. In this article, we’ll see how Replication logs are sent and what the problems associated with it are, and their solution. So let’s begin :
There are different types of logs through which replication data is sent. Let’s discuss the most commonly used ones.
- Statement-based log replication: The leader logs every write ( INSERT, UPDATE, or DELETE), which is then forwarded to the followers, who in turn sequentially run these statements.
Points to be kept in mind when implementing statement-based replication :
- Any statement that calls a non-deterministic function(TODAY(), random()), then it will generate a different value from the replica.
- If auto-increment columns are there, then it’s very important to make sure that each statement is executed serially.
- Triggers, Stored-procedure or any other statements that may have side effects may result in different effects on different replicas.
All the above three cases may cause data to be different in leader and the followers, thus making the system inconsistent. Databases like MySQL (before v5.1), VoltDb used this by controlling some of the above drawbacks.
2. Write-ahead log(WAL) Shipping: It is a log-based replication where the log is an append-only sequence of all the writes to the database. It’s similar to the log structure we see in LSTM or Binary Tree. When the follower receives this log, it builds a copy of the same data structures found on the leader.
Points to be kept in mind when implementing WAL replication :
- It makes data closely or tightly coupled to the storage engine as it describes data on a very low level. For eg. which bytes were changed or compacted in which disk blocks.
- If either the leader or follower changes their database version, it’s typically not possible to run together, and one may have to maintain them individually.
It is mostly used as a disaster recovery solution and has low set-up costs and easy maintenance. Postgres, MySQL and Oracle are different databases where we have seen such implementation.
3. Row-based log Replication: For relational databases, it is a sequence of records describing writes to database tables at the granularity of a row. For inserted values, it contains all the new column values, for deleted and updated records, it contains enough information to uniquely identify the rows. Since row bases logs are decoupled from the storage engine internals, it’s more decoupled and backward compatible, allowing different versions of leader and followers to run together efficiently. It is also easier to parse and is used for analysis, building custom indexes, and cache.
4. Trigger-based log Replication: A trigger lets one write custom codes that get executed when data changes. It’s typically useful when one wants to replicate only a subset of data or replicate from one kind of database to another, or one needs any conflict resolution logic. It logs the change into a separate table from where it can be read by an external process. Even stored-procedures are used for trigger-based log replication.
Points to be kept in mind when implementing trigger-based log replication :
- It has greater overheads as external process reads and then replicate.
- It is more prone to bugs and limitations than the database’s built-in replication as it’s logic can be customised.
Now that we have discussed the Replication logs, let’s see the cases where there are lags or delays between these replications and how it affects the dynamics of the distributed system.
Replication Lags :
Let’s first discuss a few scenarios :
Case 1: Suppose your application is synchronous and you keep on adding multiple followers in your “read-scaling architecture”. Imagine the delay it can cause to the client, who is waiting for the response. For eg, if you updated your profile bio on LinkedIn, and you get the “updated successfully” message after a few seconds, or a few minutes ? What if one node breaks? Alas! your entire system can come down.
Case 2: Imagine your application to be asynchronous, thus believing in “eventual consistency”? And you still see the outdated bio, since your write must have gone to the leader but the read is going to some outdated replica which still hasn’t replicated the leader.
All the above cases create a bad user experience and also question the dependability of the entire system. Let’s see what can be done, to avoid these.
1. Reading your own writes :
The idea of this is “read-your-write” or “read-after-write” consistency, thus, guaranteeing that when the user reloads the page, it sees its changes. It must be noted that it doesn’t promise that the other user would instantly see these changes.
We follow the below technique to implement “read-your-write”.
- When reading something that the user has modified, read it from the leader.
- The client can remember the timestamp of its most recent write, then the system can ensure that the replica serving the request has the updates at least until that time.
You may think, that if most of the things in the application are editable, then reading from the leader won’t be that effective, thereby negating the purpose of leader-follower architecture. So, in this case, we can keep a track of timestamps and provide custom logic until what time we can read from the leader. Like until certain timestamp or duration, we’ll read from the leader and then route the request.
But as we have seen, there are always some trade-offs, let’s look at these plausible pitfalls.
- When replicas are distributed across multiple data centers, any request that needs to be served by the leader must be routed to the data center containing the leader, thus adding complexity.
- Also, cross-device read-after-write consistency must be provided when accessing the same page by the same user from different devices.
2. MONOTONIC READS
Case 1: Imagine you see x number of likes on your post, but just after reloading the page, it decreases suddenly. You would wonder why , and often blame network for it, but let’s dive into the root cause of it.
This is because after the page reload the request might have got routed to a different replica that was lagging behind the follower, thus showing old data.
Monotonic reads servers as a guarantee that anomalies as above don’t occur and we don’t go back in time. It must be noted that it’s a greater guarantee than “eventual consistency”, but lesser than strong consistency. It assures that the user is always reading the newer data, by making sure that each user always makes reads from the same replica, and if that fails, it will be re-routed to another replica.
3. CONSISTENT PREFIX READS
Case: Imagine you insert two rows in your database, and while querying, it shows the second record but not the first.
It might have occurred because the replica serving the request would have taken the writes inconsistently. Consistent prefix reads guarantee that if a sequence of writes happens in a certain order, then anyone reading all those writes will see them appear in the same order.
All the above methods, reading from the leader, latest replica reads through timestamps, and consistent reads help in improving our user experience and reducing latency, thus helping in building today’s robust systems.
We’ll be discussing more on replication in further posts. Hope you liked the article, constructive comments are welcomed, and thanks for reading :)
You can also read more about the other type of replications :
Part 1: Single Leader Replication
Part 3: Multi-Leader Replication
Part 4: Leaderless Replication