Gluesync - What’s new in 1.5

DEPRECATION NOTICE: Gluesync 1.5 is reaching its end of life. New features will no longer be supported a part from security updates and Data modeling features. MOLO17 will continue to support Gluesync 1.5 for security updates only until end of 2025, and users relying on Data modeling features up to that date. If you’re not using Data modeling features, please upgrade to Gluesync 2.0 to continue receiving product support, features and security updates. We will no longer support Gluesync 1.5 non-data modeling users after June 2025. Support for Gluesync 1.5 will end of 2025.

Users not relying on data modeling features will not be supported after June 2025, consider upgrading to Gluesync 2.0 as soon as possible.

Entities vs Virtual entities

We introduced in 1.4 the concept of Virtual entities enabling Gluesync to provide a new set of features like the Data modeling.

We’re now deprecating with 1.5 of our legacy Entities switching over to a fully virtual approach. This enables Gluesync to decouple the real tables and schemas from the data modeling it is requested to apply over data, enabling us to have one single entry point for DM tasks and further improve this feature in the upcoming versions.

Support for multiple schemas within the same instance

From now on entities come with Schema and Table definitions, this means that Gluesync won’t rely anymore on the given entity name which we found error-prone and fragile.

With this newly introduced set of parameters, Gluesync understands the given context and is capable of crossing/accessing different schemas within the same instance overcoming the previous constraints related to having a single schema per instance or hard-typed schema names associated with the entity names.

We have updated all of our product documentation pages for SQL to NoSQL as well as NoSQL to SQL to reflect the new set of required fields in the config.json file.

Examples and detailed guides will also come along with that major change to support any need in terms of legacy config portability as well as fresh setups needs.

Declare type, scope, schema and table names

With the advent of fully virtual entities, you’re now required to declare some key fields to give Gluesync a little bit more context in exchange of a super-powerful replication.

Full table or field mapping

In the case of entities configured for full table copy or with field mapping (either field skip or renaming), you’ll now be requested to specify schema and table fields within each source entity. Like the given example here following:

"sourceEntities": {
    ...
    "myEntity": {
        "schema": "MYSCHEMA",
        "table": "OPEN_SEA"
     },
     ...
}

myEntity is not there just for purpose: entity names are now all virtual, and they can differ from the belonging real table/schema names.

Advanced data modeling

In the case of entities configured for Advanced data modeling, you’ll now be requested to specify type and a new optional field called scope which defaults to "" (empty string) within each source entity. Like the given example here following:

"sourceEntities": {
    ...
    "myEntity": {
        "dataModeling": [...],
        "type": "opensea",
        "scope": ""
     },
     ...
}

New values:

type: will then be used as the usual key from inside your JSON documents, and can be customized as per your requirements;
scope (optional): will be used as source/target scope in NoSQL databases supporting this kind of data tenancy, like Couchbase for example.

schema and table keys are not supported here since they are given by your data modeling definition and understood by the DM engine.

SQL query data modeling

"sourceEntities": {
    ...
    "myEntity": {
        "query": "SELECT * FROM MYSCHEMA.OPEN_SEA",
        "type": "opensea",
        "scope": "myschema"
     },
     ...
}

New values:

type: will then be used as the usual key from inside your JSON documents, and can be customized as per your requirements;
scope (optional): will be used as source/target scope in NoSQL databases supporting this kind of data tenancy, like Couchbase for example.

schema and table keys are not supported here since they are given by your data modeling definition and understood by the DM engine.

Support for numeric schema names

Previously it was not possible to declare a schema name as a numeric value, for example, instead of having MYSCHEMA as a schema name (string) you could potentially have in your database a schema named 1234. With an update to our SQL query parsing engine, you can now tell Gluesync to work also with these numeric-named schema names. Just provide them with "1234" (quotes).

Support for collates and case-sensitive databases

We are highlighting the following since it’s pretty important to keep in mind.

Please note that Gluesync starting from this version has become case sensitive, so mySchemaName is now different from "MYSCHEMANAME" as well for "OPEN_SEA" table is now different from "openSea".

This feature enables you to work with databases that have different collates (case sensitivity) configurations.

Empowered data modeling

While initially constrained to just SQL to NoSQL data pipelines we’ve now extended the support to NoSQL to SQL replication flows. This feature is initially made available in incoming replications from Couchbase NoSQL data change feeds, aiming to extend it to others in future releases.

Support for custom scopes, tables and types

Data modeling also benefits from the major enhancement brought by the ability to define custom scopes, table names and data types.

Support for duplicated tables

Virtual entities are gaining momentum from the user’s community and we heard from use cases where replication insisting from the same table set was needed especially when data modeling is involved. We extended the support to duplicated table entries, enabling you to declare either identical or similar queries over the same data set by just providing a different name to the declared entity.

Improved performances when performing snapshot (data refresh)

This feature brings an incredible 10x boost in performance when performing initial snapshot activities.

Full table migration now supports a new way for fetching pages of data from the source database, giving a 10x in terms of performance improvements compared to the legacy version. The legacy version will still be supported and available using the flag usePaginatedMigration set to true, the new version is now the default standard.

Improvement around Couchbase NoSQL source and target connectors

Full support of Scopes in Couchbase

Coming along with a fully multi-tenant approach tackling schema containers from the relational world we are bringing full support to Scopes in Couchbase NoSQL extending the same to other supported targets.

While previously Couchbase users were forced to have one single Scope per instance or, even worse, limited to just the _default Scope, from this version Gluesync makes usage of the full set of capabilities given by this feature in Couchbase, automatically creating missing scopes and missing collections if needed. Your replicated data has never been such organized.

No more Metadata bucket 🎉

We finally got rid of the Metadata bucket: YES, you now just need to define only the source/target bucket that Gluesync will use to source/store data. Space, resources and time saved. Instead of the previously mandatory Metadata bucket now Gluesync makes use of its Scope and Collections to store and keep track of its internal status. Enjoy!

Sync delete operation from Couchbase to the target database

Couchbase users were asking for it and we delivered: with 1.5 you can now sync physical document deletion from Couchbase to any target database. Previously only logical deletions were supported.

To provide support to that feature a new object key is required it’s called entitiesKeys. Please check the respective documentation section to learn how to make use of this.

New system metrics

A new set of metrics has been introduced in Gluesync, now capable of providing detailed monitoring over the environment where it is currently running plus a set of details regarding the resources usage like RAM, CPUs and Network bandwidth.

This set of metrics is exposed as a JSON output available at the following REST endpoint: /stats. Stats will be printed in a prettified format at the bootstrap of the instance. The stats will look like this:

Stats: {
  "os": {
    "family": "Debian GNU/Linux",
    "manufacturer": "GNU/Linux",
    "version": "11",
    "codeName": "bullseye",
    "buildNumber": "5.14.0-284.11.1.el9_2.x86_64",
    "systemUptime": 156277,
    "systemBootTime": 1696413696
  },
  "cpu": {
    "cpuVendor": "GenuineIntel",
    "cpuName": "Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz",
    "cpuFamily": "6",
    "cpuModel": "58",
    "physicalProcessorCount": 8,
    "logicalProcessorCount": 8,
    "processorCpuLoad": [
      0.64,
      0.43434343434343436,
      0.21782178217821782,
      0.12,
      0.4,
      0.19,
      0.04040404040404041,
      0.28431372549019607
    ],
    "systemCpuLoad": 0.6934673366834171,
    "interrupts": 221742579,
    "systemCpuTicks": [
      6622420,
      16530,
      1897680,
      1236170170,
      40330,
      687320,
      620270,
      0
    ]
  },
  "ram": {
    "total": 15712,
    "available": 11475,
    "pageSize": 0
  },
  "networkIfs": [
    {
      "name": "eth0",
      "iPv4address": [
        "172.17.0.2"
      ],
      "speedInMegabytes": 10000,
      "megabytesReceived": 10,
      "megabytesSent": 0
    }
  ]
}

Metrics logged as console output

Above mentioned metrics are made available also from within the console logs output (and default log file) in the following format:

You can grab those metrics by looking for INFO - Stats log entries. They won’t come anylonger in a prettified format after the bootstrap of Gleusync to save space in logs.

INFO - Stats: {"os":{"family":"Debian GNU/Linux","manufacturer":"GNU/Linux","version":"11","codeName":"bullseye","buildNumber":"5.14.0-284.11.1.el9_2.x86_64","systemUptime":156284,"systemBootTime":1696413696},"cpu":{"cpuVendor":"GenuineIntel","cpuName":"Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz","cpuFamily":"6","cpuModel":"58","physicalProcessorCount":8,"logicalProcessorCount":8,"processorCpuLoad":[0.4318181818181818,0.5205479452054794,0.23,0.3894736842105263,0.3564356435643564,0.32558139534883723,0.5138888888888888,0.40217391304347827],"systemCpuLoad":0.2975420439844761,"interrupts":221960632,"systemCpuTicks":[6647080,16530,1901780,1236194510,40330,687730,622400,0]},"ram":{"total":15712,"available":8963,"pageSize":0},"networkIfs":[{"name":"eth0","iPv4address":["172.17.0.2"],"speedInMegabytes":10000,"megabytesReceived":39,"megabytesSent":44}]}

Following the same approach even legacy Metrics previously available only from within the /status endpoint are now also being provided via the same pattern from the console log.

For a full deep dive on monitoring in Gluesync please check out our docs page: Monitoring Gluesync.