Gluesync - RDBMS and NoSQL data replication

Gluesync is a software product for real-time event-based data replication from RDBMS to NoSQL databases and viceversa, plus NoSQL to NoSQL databases too.

This means that you will be able to replicate data to and from relational, non-relational and between non-relational databases in real-time using native technologies officially supported and maintained by each database vendor, deployable in any cloud, virtual or containeraized environment and on-prem deployments, even on bare-metal servers.

Suppport for relational-to-relational database replication is set to become available with our upcoming major release.

You can read more about Gluesync’s native approach looking at our blog article The Gluesync Journey.

In this documentation you can find the installation and configuration steps necessary to setup Gluesync into your infrastructure and connect a RDBMS instance with a NoSQL database and viceversa, or a establish a replication between NoSQL and NoSQL databases. But, before jumping into the details, let’s talk about few core concepts.

Core concept behind Gluesync’s architecture

One of the main considerations we took when designing the Gluesync architecture is its native ability to be upscaled and deployed with ease, just like you used to do with your container-based applications: pull-config-deploy-enjoy. That’s the motto. You gain full control of what happens under the hood without keeping you into playing with GUIs that wouldn’t have allowed you to harness the full potential of this data replication suite.

Gluesync is being shipped through containers, if you haven’t already read about containers you can have a look here at this link that points you directly to the official docker’s homepage for a reference about a common container environment.

This doesn’t mean that if you don’t have the possibility to run a containeraized environment into your infrastructure you couldn’t run Gluesync, on the contrary! You can ask our team to provide you the package for your specific destination platform in order to run it even on-prem in bare metal servers.

Here in the following diagram is represented an architectural overview of a Gluesync environment.

a diagram illustrating the architectural overview of Gluesync

Design

The design concept that have been adopted has basically been taking in consideration the purpose of each core functionality provided by the suite: it provides ability to replicate data from a relational database to a non-relational database and vice versa. This two aformentioned functionalities are called "ways" or "directions".

So, rather than having a monolithic single piece of general-purpose software, we have decoupled its functinalities into an auto-consistent and highly-resilient and specialized service per each "direction". Capable of replicating, logging, alerting and being monitored by itself without the need of a master central authority that could only have increased complexity and introduced a single point of failure into the overall architecture.

Being that said the result and outcome for our users is the ability to decide what to deploy per each use case: you have the control over the decision to deploy only the module to replicate data from MS SQL Server to MongoDB or just the vice versa due to your specific use case. In that way you’re going to have the fine graned control over permissions, security and performances that you deserve from a product made for real-world production use cases.

Understanding CDC vs GDC

When talking about sourcing changes from a relational database there are just a few ways to accomplish the task of auditing writes, updates and deletions performed at field-row level. The most common approach, but also the most challenging, is reading from the database transaction logs that luckily nowadays are wrapped around an API layer called CDC - Change Data Capture - which provides a way for application developers and DBAs to read throught it and understand the entire history of the changes that have been made from a certain time frame.

We used the term "challenging" because every database vendor have implemented its own way to expose these logs and building a tool capable of being compatible with all of them it is indeed a challenge by itself and sometimes specific vendors or database versions (especially older ones) don’t provide either CDC or low level APIs to grab transaction logs from it.

In order to provide a wider compatibility on capturing real-time change streams from the vast majority of relational databases out there in the field we decided to develop a fine-tuned subset of UDFs (user defined functions) that together with a set of triggers helps Gluesync to enlarge the compatibility-base with relational databases while maintaining a safe, fast and secure approach with which entire lifecycle is entirely managed by its engine itself. We called this feature GDC - Gluesync Data Capture - used for certain kind of database brands | versions | editions that do not currently (or at all) the native CDC tecnique or for those who we initially decided to make compatible first throught that feature and then to provide CDC out-of-the-box in the upcoming future.

Naming conventions

As naming conventions to nickname each type of Gluesync direction we adopted the following nomenclature per each replication component:

from relational (RDBMS) to non-relational databases (NoSQL) has been labeled SQL to NoSQL
from non-relational (NoSQL) to relational databases (RDBMS) has been labeled NoSQL to SQL
from non-relational (NoSQL) to non-relational databases (NoSQL) has been labeled NoSQL to NoSQL

NoSQL to NoSQL data replication is a newly added feature in this version.

What you’re going to need

Before proceeding, for each Gluesync instance that you’re willing to deploy, please check if you have the following information:

RDBMS connection details, like:
- Username
- Password
- Connection string (IP address / port)
- Tables names
NoSQL database connection details, like:
- Destination, either bucket or database
- Connection string (IP address / port)
- Username
- Password

If you’re involved on the implementation of Gluesync or just tryng it out via our trial program, we higly suggest you to have a visual query editor tool in order to bootstrap multiple datasources connection, easily import / edit / display data. The tested tools from the Gluesync product and QA teams are:

As MOLO17 we do not provide any support on these specific tools neither we advertise them, you are free to use the toolset that you prefer the most in order to connect and perform queries against your database(s).

…and also, do not miss our section dedicated to tutorials and use cases of Gluesync.

Compatibility matrix

Non-relational databases (NoSQL)

vendor / edition / version	Gluesync compatibility	Technology used
Aerospike	✅ from Gluesync v1.3.4 starting from version 5.X (CE, EE)	compatible as a target through official SDK, ability to read CDC stream planned for Q4 2023
Azure CosmosDB	⏱ support coming soon	-
Cassandra	⏱ support coming soon	-
Couchbase	✅ from Gluesync v1.0 starting from version 5.5 (CE, EE, Capella)	Native CDC via Eventing service, also available via Sync Gateway, works via official SDK
CouchDB	⏱ support coming soon	-
DynamoDB	✅ from Gluesync v1.4	Native CDC via DynamoDB Streams, works via official SDK
MongoDB	✅ from Gluesync v1.3 starting from version 3.6 (Atlas, EE, CE)	Native CDC via Change Streams, works via official SDK
RavenDB	✅ from Gluesync v1.4 starting (CE, EE)	compatible as a target through official SDK, ability to read CDC stream planned for Q4 2023
Redis	⏱ support coming soon	-
ScyllaDB	⏱ support coming by Q4 2023	-

vendor / edition / version

Gluesync compatibility

Technology used

Aerospike

✅ from Gluesync v1.3.4 starting from version 5.X (CE, EE)

compatible as a target through official SDK, ability to read CDC stream planned for Q4 2023

Azure CosmosDB

⏱ support coming soon

Cassandra

⏱ support coming soon

Couchbase

✅ from Gluesync v1.0 starting from version 5.5 (CE, EE, Capella)

Native CDC via Eventing service, also available via Sync Gateway, works via official SDK

CouchDB

⏱ support coming soon

DynamoDB

✅ from Gluesync v1.4

Native CDC via DynamoDB Streams, works via official SDK

MongoDB

✅ from Gluesync v1.3 starting from version 3.6 (Atlas, EE, CE)

Native CDC via Change Streams, works via official SDK

RavenDB

✅ from Gluesync v1.4 starting (CE, EE)

compatible as a target through official SDK, ability to read CDC stream planned for Q4 2023

Redis

⏱ support coming soon

ScyllaDB

⏱ support coming by Q4 2023

Big data store

vendor / edition / version	Gluesync compatibility	Technology used
Amazon AWS S3	✅ from Gluesync v1.3.4	write-to performed through official SDK
Apache HBase	✅ from Gluesync v1.4	Native CDC using HBase Java SDK
Azure data lake	⏱ write-to support coming soon	writes will be performed through official SDK
Databriks	⏱ write-to support coming soon	writes will be performed through official SDK
Google Cloud Storage	✅ from Gluesync v1.4	write-to performed through official SDK
Snowflake	⏱ write-to support coming soon	writes will be performed through official SDK

vendor / edition / version

Gluesync compatibility

Technology used

Amazon AWS S3

✅ from Gluesync v1.3.4

write-to performed through official SDK

Apache HBase

✅ from Gluesync v1.4

Native CDC using HBase Java SDK

Azure data lake

⏱ write-to support coming soon

writes will be performed through official SDK

Databriks

⏱ write-to support coming soon

writes will be performed through official SDK

Google Cloud Storage

✅ from Gluesync v1.4

write-to performed through official SDK

Snowflake

⏱ write-to support coming soon

writes will be performed through official SDK

Real-time streaming

vendor / edition / version	Gluesync compatibility	Technology used
Apache Kafka	✅ from v1.4 as a target	Real-time messaging to Kafka topics through official SDK, supporting also transactions.
Apache Pulsar	⏱ write-to support coming soon	writes will be performed through official SDK
Solace PubSub+ and PubSub+ Cloud	✅ from v1.4 as a target	Message publishing (Persisted / direct) via official SDK via SMF
RabbitMQ	⏱ write-to support coming soon	writes will be performed through official SDK

vendor / edition / version

Gluesync compatibility

Technology used

Apache Kafka

✅ from v1.4 as a target

Real-time messaging to Kafka topics through official SDK, supporting also transactions.

Apache Pulsar

⏱ write-to support coming soon

writes will be performed through official SDK

Solace PubSub+ and PubSub+ Cloud

✅ from v1.4 as a target

Message publishing (Persisted / direct) via official SDK via SMF

RabbitMQ

⏱ write-to support coming soon

writes will be performed through official SDK

Relational databases (RDBMS)

vendor / edition / version	Gluesync compatibility	Technology used
DB2 LUW	✅ from Gluesync v1.4.1 on DB2 LUW (tested from version 11.5)	On LUW via Gluesync Data Capture (GDC)
IBM i series (AS/400)	✅ from Gluesync v1.4.25 from AS400 7.1	Native CDC via Journal APIs
Informix	⏱ CDC support coming soon	-
MariaDB	✅ from Gluesync v1.3.3, tested from version 10.0 and above	Via Gluesync Data Capture (GDC)
Microsoft SQL Server and Microsoft SQL Azure	✅ from Gluesync v1.0, all editions	Native CDC via Change Tracking, from version 2016 or via Gluesync Data Capture (GDC) for older versions
MySQL	✅ from Gluesync v1.3.3, tested from version 8.0 and above	Via Gluesync Data Capture (GDC)
Oracle Database	✅ from Gluesync v1.2, all editions	Native CDC via Xstream APIs starting from 11.2g (11.2.0.4) or via Gluesync Data Capture (GDC) for older versions
PostgreSQL	✅ from Gluesync v1.3.3, tested from version 9.0 and above	Via Gluesync Data Capture (GDC)
SAP Hana	⏱ CDC support coming soon	-
SingleStore	⏱ write-to support planned, CDC support coming soon	-
Sybase SQL (Adaptive Server Enterprise, ASE, SAP Sybase)	✅ from Gluesync v1.3.3, tested from version ASE 15.7 and above	Via Gluesync Data Capture (GDC)
Sybase SQL Anywhere	✅ from Gluesync v1.3.3, to be tested	Via Gluesync Data Capture (GDC)
YugabyteDB	⏱ CDC support coming soon	-

vendor / edition / version

Gluesync compatibility

Technology used

DB2 LUW

✅ from Gluesync v1.4.1 on DB2 LUW (tested from version 11.5)

On LUW via Gluesync Data Capture (GDC)

IBM i series (AS/400)

✅ from Gluesync v1.4.25 from AS400 7.1

Native CDC via Journal APIs

Informix

⏱ CDC support coming soon

MariaDB

✅ from Gluesync v1.3.3, tested from version 10.0 and above

Via Gluesync Data Capture (GDC)

Microsoft SQL Server and Microsoft SQL Azure

✅ from Gluesync v1.0, all editions

Native CDC via Change Tracking, from version 2016 or via Gluesync Data Capture (GDC) for older versions

MySQL

✅ from Gluesync v1.3.3, tested from version 8.0 and above

Via Gluesync Data Capture (GDC)

Oracle Database

✅ from Gluesync v1.2, all editions

Native CDC via Xstream APIs starting from 11.2g (11.2.0.4) or via Gluesync Data Capture (GDC) for older versions

PostgreSQL

✅ from Gluesync v1.3.3, tested from version 9.0 and above

Via Gluesync Data Capture (GDC)

SAP Hana

⏱ CDC support coming soon

SingleStore

⏱ write-to support planned, CDC support coming soon

Sybase SQL (Adaptive Server Enterprise, ASE, SAP Sybase)

✅ from Gluesync v1.3.3, tested from version ASE 15.7 and above

Via Gluesync Data Capture (GDC)

Sybase SQL Anywhere

✅ from Gluesync v1.3.3, to be tested

Via Gluesync Data Capture (GDC)

YugabyteDB

⏱ CDC support coming soon

Tested means that actually each version that ranges from the specific tag mentioned and above are currently under the integration tests suite and being tested together with performance benchmarks that are performed per each commit-basis in order to ensure no regression and the best quality outcomes. Other versions older that those included in our test suites might work but are not currently battle-tested for a production use case. If you would like to consider testing a specific database version which appears to not have been currently made compatible you are more then welcome to join our beta program, in that case consider to drop us a line at this email address telling us that you would like to be part of the beta program for a specific db version tag.

Minimum system requirements

a containerized environment is suggested (docker / K8s / OpenShift, …);
4 vCPU and 4GB of RAM;
1 GB free disk space, used for logging redaction.