Part of the Map of Streaming Data Systems series

What is Debezium?

Debezium is an open-source change data capture software that allows you to capture changes in a database and store them in a Kafka topic. Debezium records all of the changes that happen within a table and produces an event stream that can be consumed by other applications.

Debezium is built on top of Apache Kafka. It provides Kafka Connect compatible connectors for MySQL, PostgreSQL, Oracle, Microsoft SQL Server, MongoDB, Apache Cassandra, Vitess, and DB2.

All data changes are captured and stored in Kafka topics allowing your applications to consume the data changes in real-time. In case of a failure, the data is stored in a Kafka topic and can be consumed by your application once it recovers.

Features

Rather than using polling or dual writes, each of the connectors is capable of ingesting data from a different database using CDC (Change Data Capture) techniques.

Some of the features of Debezium include:

  • Ensuring that the data changes are all captured in a consistent manner.
  • Very low latency for the data changes events. The delay is usually in the order of milliseconds.
  • No changes to the data model are required to capture the data changes.
  • Deletes are also captured and the state of old records and metadata is maintained.

Debezium Example

Before we dive into a hands-on demo, let’s take a look at an example of how Debezium works.

Upon running this update statement:

UPDATE users SET email = 'jane@example.com' WHERE id = 123;

Debezium will produce an event like this to a Kafka topic matching the name of the table:

{
    "op": "u",
    "source": {
        "table": "users"
        ...
    },
    "ts_ms": 1616428166123,
    "before":{
        "id":123,
        "name": "Jane Doe",
        "email": 'jane@domain.com',
        "created_at": "Mon, 15 Mar 2021 12:34:56 GMT",
        "updated_at": "Mon, 15 Mar 2021 12:34:56 GMT"
    },
    "after":{
        "id":123,
        "name": "Jane Doe",
        "email": 'jane@example.com',
        "created_at": "Mon, 15 Mar 2021 12:34:56 GMT",
        "updated_at": "Mon, 22 Mar 2021 15:43:21 GMT"
    }
}

The change data capture event contains metadata about the table and the state of the entire row before and after the update.

Debezium Demo

Debezium is a great tool for capturing data changes in a database. Let’s take a look at the following hands-on demo:

Materialize Debeziium ecommerce demo

Before you get started, you need to make sure that you have Docker and Docker Compose installed: Installing Docker.

As shown in the diagram above we will have the following components:

  • A loadgen mock service to continually generate orders.
  • The orders would be stored in a MySQL database.
  • As the database writes occur, Debezium streams the changes out of MySQL to Kafka.
  • We would then ingest this Kafka topic into Materialize.
  • In Materialize we will do some aggregation on the orders.
  • The Metabase instance is used to visualize the data.

For more details, see the Join Kafka with a Database using Debezium and Materialize article or check out this video demo here:

Video: Join Kafka with a Database using Materialize and Debezium