Gluesync Architecture

Gluesync’s architecture is built on a flexible, agent-based system centered around the Core Hub, with extensibility through SDKs and modules. This design enables real-time data integration across diverse platforms while ensuring scalability, reliability, and security.

Architecture Overview

Gluesync 2 Architecture

The architecture consists of five main components:

  • Source Agents: Handle data extraction and change capture

  • Core Hub: Orchestrates data flow and system operations

  • Target Agents: Manage data loading and transformation

  • SDKs: Enable developer integration with Core Hub

  • Modules: Extend platform functionality through SDK-based components

Core Components

Core Hub

The Core Hub is the central orchestrator, featuring:

  • High Availability

    • Core Hub #1 and #2 for redundancy

    • Automatic failover support

    • Load balancing capabilities

  • Data Processing

    • Aggregation engine

    • Denormalization support

    • Transaction coordination

  • Management

    • Administration interface

    • Monitoring dashboard

    • Security controls

Source Agents

Agent Type Capabilities

NoSQL CDC

* Natively integrated NoSQL CDC * Real-time change capture * Event-based replication

RDBMS

* Transaction-log based CDC * SQL database support * Real-time monitoring

Data Lakes

* AWS S3 & S3-like support * Dell ECS integration * Azure blob storage support

Target Agents

Agent Type Features

NoSQL

* Native integration * Load balancing * Multi-target support

RDBMS

* JDBC-oriented connections * Transaction pool management * Bulk loading capabilities

Data Lakes & Streaming

* S3 and blob storage support * Kafka integration * Solace messaging support

Communication Flow

Data Flow Diagram

diagram
  • WebSocket connections enable real-time bidirectional communication

  • Core Hub processes and transforms data in-flight

  • Multi-target support allows for parallel data distribution

Source agent caching layer

Overview

Starting from Gluesync 2.1.9, supported source agents include an embedded, persistent caching layer. Instead of streaming changes directly from the source database to the Core Hub, agents first write inbound changes to a local, append-only queue built on top of Chronicle Queue.

At a high level, this caching layer:

  • Decouples the source database from the Core Hub consumer

  • Reduces load on the source system by minimizing long-lived cursors and connections

  • Stores changes durably on disk, allowing for safe pause/resume of pipelines

  • Provides a replayable history window (configurable retention) for recovery scenarios

Chronicle Queue is a high-performance, append-only log implementation designed for low-latency systems. It stores data in memory-mapped files on disk, organizing entries into sequential segments. This makes it a good fit for Gluesync’s caching layer, where agents continuously append new change events and readers consume them in order without incurring heavy garbage collection or complex locking.

How it fits in the architecture

From an architectural perspective, the Chronicle Queue–based cache sits inside the source agent process:

  • The source agent ingests changes from the database (e.g. journal, transaction logs, CDC API) and appends them to the local queue.

  • A separate internal worker within the agent reads from the queue and forwards events to the Core Hub over the usual WebSocket connection.

  • If the Core Hub or network is temporarily unavailable, the agent continues to cache new changes locally until the connection is restored, then drains the backlog.

This design keeps the Core Hub stateless with respect to source-side buffering while giving operators a predictable and tunable buffer at the edge of each source system.

Benefits

Key benefits of the source agent caching layer include:

  • Lower source footprint – fewer active connections and reduced dependency on database-side caching artifacts.

  • Operational resilience – short outages or maintenance windows on the Core Hub side do not force resynchronizations, as changes remain in the agent cache.

  • Configurable retention – cache retention can be tuned to balance disk usage and recovery window length.

  • Simplified scaling – agents handle local buffering, allowing Core Hub instances to scale independently for processing and routing (coming soon).

Advanced Features

Data Processing

  • Aggregation Engine

    • Real-time data aggregation

    • Custom aggregation rules

    • Performance optimization

Denormalization

  • Smart Denormalization

    • Configurable strategies

    • Automatic schema mapping

    • Performance tuning

Multi-Source Support

  • Event-based Integration

    • Multiple source connections

    • Parallel processing

    • Consistent ordering

Deployment Options

Flexible Deployment Models

Model Description Best For

On-Premises

Full deployment within your infrastructure

High security requirements

Cloud

Deployment on AWS, Azure, or GCP

Scalability and flexibility

Hybrid

Mix of on-premises and cloud components

Balanced approach

Security Architecture

Security Measures

  • Network Security

    • TLS encryption

    • Secure WebSocket connections

    • Network isolation options

  • Authentication

    • API key authentication

    • Role-based access control

    • Session management

  • Data Protection

    • In-transit encryption

    • Secure credential storage

    • Audit logging

SDKs and Developer Integration

Open Source SDKs

Gluesync provides open source SDKs that enable developers to build custom integrations and extensions:

SDK Features

Python SDK

* Core Hub handshake protocol implementation * Authentication and authorization * Event handling and processing * Available at GitLab

Node.js SDK

* JavaScript-based Core Hub integration * Real-time event processing * Promise-based API design * Available at GitLab

Coming Soon

* Java SDK * Kotlin SDK

Developer Benefits

  • Open Platform

    • Build custom agents and modules

    • Extend Core Hub functionality

    • Access Gluesync APIs securely

  • Integration Support

    • Comprehensive documentation

    • Reference implementations

    • Community-supported examples

Modules

Platform Extensions

Modules are platform extensions built on top of Gluesync SDKs that enhance system capabilities:

Module Functionality

Chronos

* Advanced scheduling capabilities * Time-based job orchestration * Recurring task management * Available at GitLab

Bootstrapper

* System initialization * Configuration management * Deployment automation * Available at GitLab

Conductor

* Automated agent deployment * Resource allocation policies * Container lifecycle orchestration * See Conductor documentation

Whisperer

* Automated database operations * Multi-database connectivity * Load testing and PillowFight tooling * Available at GitLab

Automator

* Web-based Bootstrapper execution * Graphical configuration management * Cross-platform packaged executable * See Automator documentation

Convert DBMoto Metadata XML

* Syniti metadata.xml conversion * Bootstrapper template generation * Migration workflow guidance * See Conversion guide

Module Architecture

diagram
  • Modules connect to Core Hub through SDK interfaces

  • Custom modules can extend platform capabilities

  • Open architecture enables third-party development

Monitoring & Administration

Built-in Monitoring

  • Real-time metrics collection

  • Performance monitoring

  • Resource utilization tracking

  • Alert management

Administration Tools

  • Web-based admin interface

  • REST API access

  • Configuration management

  • System health monitoring

For detailed deployment instructions, see our Deployment Guide for Docker Compose and Kubernetes.