Static ScyllaDB CDC with Gluesync: Setup & Configuration Configuration

Source data from ScyllaDB

Prerequisites

To have Gluesync working on your ScyllaDB instance you will need to have:

  • Valid user credentials with permission to read from source tables.

Setup via Web UI

  • Hostname / IP Address: DNS record or IP Address of your server;

  • Port: Optional, defaults to 9042;

  • Keyspace: Name of your target keyspace;

  • Username: Username with read/write access to the tables;

  • Password: Password belonging to the given username;

  • Max connections count: Maximum number of connections the pool can instantiate.

  • Datacenter: Name of the datacenter;

Specific configuration

This agent has no specific configuration properties.

Setup via Rest APIs

The following example shows how it is possible to call the CoreHub’s Rest API via curl to setup the connection for this Agent.

Connect the agent

curl --location --request PUT 'http://core-hub-ip-address:1717/pipelines/{pipelineId}/agents/{agentId}/config/credentials' \
--header 'Content-Type: application/json' \
--header 'Authorization: ••••••' \
--data '{
        "hostCredentials": {
        "connectionName": "myAgentNickName",
        "host": "host-address",
        "port": 9042,
        "databaseName": "db_name",
        "username": "",
        "password": "",
        "maxConnectionsCount": 100,
        "enableTls": true,
        "certificatePath": "/myPath/cert.pem"
      }
}'

Setting up for CDC

Follow the steps below to enable the change data capture in your ScyllaDB cluster and let Gluesync capture its transaction logs.

ScyllaDB CDC

In ScyllaDB, change data capture works at the table level and it can be enabled at the table creation or once a table has already been created. The first step to create a CDC-enabled table is to connect to the ScyllaDB cluster and create a keyspace.

Once connected to the cluster, the keyspace can be created with the following command:

CREATE KEYSPACE demo WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'}  AND durable_writes = true;

This will create a keyspace with the name "demo". The keyspace is a fundamental organization unit and it is essentially a container that groups related tables together. A CDC-enabled table can be created in this keyspace with the following CQL statement:

CREATE TABLE IF NOT EXISTS demo.articles (
    uuid UUID PRIMARY KEY,   -- UUID for unique and distributed partitioning
    id INT,
    article_name TEXT,
    description TEXT,
    price FLOAT,
    currency TEXT,
    availability INT,
    image_url TEXT
) WITH cdc = {'enabled': true};

Issuing this command will create an additional log table in the same keyspace with the same replication strategy and with the following name: articles_scylla_cdc_log. This table is used by ScyllaDB to track the changes occurring on the base table. Each key column in the base table has a corresponding column in the log table with the same name and type. In addition to these columns, the log table contains additional metadata columns. More information on the log table is available on the ScyllaDB documentation page.

Log Rows

Log rows are special rows stored in the log table which reflect the changes, such as insert, update and delete happening on the base table. One modification statement for the base table may cause multiple entries to appear in the log table which depend on the following things:

  • the CDC options you use (e.g. whether pre-image is enabled or not);

  • the write action you perform: insert, update, row deletion, range deletion, or partition deletion;

  • types of affected columns (notably collections require complex handling).

Working with Before & After images

Before and after images are a feature that allows Gluesync to track the changes that have occurred in your tables. Luckily, in ScyllaDB, it is possible to configure this feature through a set of options that can be specified with the CQL statement either at the table creation or by issuing an alter operation on the table. In order for Gluesync to work correctly it is mandatory to set the following options for the CDC-enabled tables.

Here is an example for the previous table:

ALTER table demo.articles WITH cdc = {'enabled': true, 'preimage': 'full', 'postimage': true, 'delta': 'keys'};

When these options are specified additional preimage and postimage row are recorded in the log table and Gluesync can correctly parse the log rows generated after a change on the base table.