TIL (Today I Learned) - short notes/articles from my day to day learning

Introduction

Recently I was working with a legacy pub sub architecture with MongoDB which was utilizing mongo change streams for pushing real time data changes. Now I had to find something similar that could be incorporated in PostgreSQL, a quick google search will show you that PostgreSQL or MySQL doesn’t have something exactly similar to mongo streams. Now, something like this can also be implemented on application level but in that architecture, the stream is centralized to our application and not very efficient if we are talking in terms of accessing it from multiple different services.

CDC

CDC or Change Data Capture is a software architecture that converts changes in a datastore into a stream of CDC events.

In my further readings, I came across a simple & native solution to implementing a CDC in postgres, although this can also be extended to other SQL variants as well.

WAL - Write Ahead Log

Most of all the databases utilizes write ahead logs which basically logs out an operation before actually performing it on the relation. The primary objective of a WAL is to aid in data recovery on failures, but this is often utilized for generated CDC stream and postgres community have this use case in mind and hence they provide services such as https://www.postgresql.org/docs/current/logicaldecoding-explanation.html .

Logical Decoding

The contents of WAL are written in a special format that Postgres understands which are not exactly human readable, Logical Decoding is the process of extracting the contents of WAL into a specified format. The conversion of format is done with what we call an Output Plugin. The output plugin changes that record from the WAL format to the plugin’s format (e.g. a JSON object). This output is then fed to the consumer from a Replication Slot which is responsible for channeling these changes to the consumer. One database can have multiple slots exist independent to each other.

Implementation

Let’s see logical decoding in action, using wal2json plugin with JSON as selected output format. Let’s consider structure of the sample table is as follows:

id INT
message VARCHAR
room_id INT
user_id INT
// INSERT operation into message
{
  "change":[
    {
      "kind":"insert",
      "schema":"public",
      "table":"message",
      "columnnames":[
        "id",
        "room_id",
        "message",
        "user_id"
      ],
      "columntypes":[
        "integer",
        "integer",
        "character varying"
      ],
      "columnvalues":[
        1,
		1,
        "initial message"
      ]
    }
  ]
}

Similar CDC events can be found for other operations as well. This stream can now be fed into a messaging queue from which other consumer can subscribe to it.

References