DDIA | Replication | Replication logs

In the previous post, we learned about the replication technique of Leaders and Followers. How do they work under the hood? They use several techniques – 

  1. Statement based replication
  2. Write ahead log
  3. Logical (row based) replication
  4. Trigger based replication

1. Statement based replication

For every INSERT, UPDATE, DELETE operation, leader sends the statement log to followers and each follower parses and executes that SQL statement as if it had been received from a client.

Problems – 

  • non deterministic functions like time(), rand(), if used for any value, will have different values for same row in different replicas
  • if auto-incrementing column is present then the replication on followers should be sequenced in the same manner to avoid faulty records.
  • Statements that have side effects (e.g., triggers, stored procedures, user-defined functions) may result in different side effects occurring on each replica, unless the side effects are absolutely deterministic.

It is possible to work around those issues—for example, the leader can replace any nondeterministic function calls with a fixed return value when the statement is logged so that the followers all get the same value. However, because there are so many edge cases, other replication methods are now generally preferred.

Was used in MySQL till version 5.1

2. Write Ahead Log (WAL)

In one of our previous posts on data structures which power databases, we learned about the logs, and how writes are appended to end of these logs. (eg. B-Trees, SST/LSM Trees).

The log is an append-only sequence of bytes containing all writes to the database. We can use the exact same log to build a replica on another node: besides writing the log to disk, the leader also sends it across the network to its followers. When the follower processes this log, it builds a copy of the exact same data structures as found on the leader.

Used in PostgreSQL and Oracle.

The main disadvantage is that the log describes the data on a very low level: a WAL contains details of which bytes were changed in which disk blocks. This makes replication closely coupled to the storage engine.

 

3. Logical log replication

This one uses different log formats for replication and for the storage engine, which allows the replication log to be decoupled from the storage engine internals. This kind of replication log is called a logical log, to distinguish it from the storage engine’s (physical) data representation.

A logical log for a relational database is usually a sequence of records describing writes to database tables at the granularity of a row:

  • For an inserted row, the log contains the new values of all columns.
  • For a deleted row, the log contains enough information to uniquely identify the row that was deleted.
  • For an updated row, the log contains enough information to uniquely identify the updated row, and the new values of all columns (or at least the new values of all columns that changed).

A transaction that modifies several rows generates several such log records, followed by a record indicating that the transaction was committed.  Since a logical log is decoupled from the storage engine internals, it can more easily be kept backward compatible, allowing the leader and the follower to run different versions of the database software, or even different storage engines.

Used in MySQL.

4. Trigger based replication

This one allows application to modify databases using triggers and stored procedure.

A trigger lets you register custom application code that is automatically executed when a data change (write transaction) occurs in a database system. The trigger has the opportunity to log this change into a separate table, from which it can be read by an external process. That external process can then apply any necessary application logic and replicate the data change to another system. 

Trigger-based replication typically has greater overheads than other replication methods, and is more prone to bugs and limitations than the database’s built-in replication. However, it can nevertheless be useful due to its flexibility.

Thanks for stopping by! Hope this gives you a brief overview in to different techniques that power leaders and followers replication approach under the hood. Eager to hear your thoughts and chat, please leave comments below and we can discuss. 


One response to “DDIA | Replication | Replication logs”