Snapshot Tasks

Snapshot tasks are a crucial part of Gluesync’s data synchronization, enabling you to migrate or initialize large datasets efficiently. This chapter explains how to maximize snapshot performance by choosing between INSERT and UPSERT operations, tuning snapshot writing concurrency, and using logical partitioning on large source tables.

Overview

During a snapshot task, Gluesync copies the entire dataset from the source to the target. The way records are written—either as INSERTs or UPSERTs—directly impacts performance and data integrity.

INSERT vs UPSERT: Performance and use cases

UI options for INSERT and UPSERT during snapshot tasks

TRUNCATE before snapshot

Gluesync 2.1 introduces the ability to perform a TRUNCATE operation on the target table before executing a snapshot task. This is particularly useful when you want to ensure a clean slate before loading fresh data.

Key features:

  • Clean Data Loading: Removes all existing data from the target table before the snapshot begins

  • Per-snapshot control: Choose whether to enable TRUNCATE on a per-snapshot basis

  • Agent-level safety: Can be disabled at the target agent level for security

How to use

  1. Navigate to the entity’s play tab

  2. Select "Run Snapshot"

  3. Select your preferred write mode (INSERT/UPSERT): for INSERT mode, TRUNCATE before snapshot is enabled by default;

  4. Click "Start" to start the entity

Agent-level configuration

For security reasons, TRUNCATE before snapshot can be disabled at the target agent level:

  1. Go to the target agent’s configuration

  2. Navigate to the "Advanced Settings" tab

  3. Locate the "Disable TRUNCATE on Snapshot" parameter

  4. Set to true to disable TRUNCATE operations (default is false)

When enabled, this setting prevents any TRUNCATE operations during snapshots for all entities using this target agent, regardless of individual snapshot settings.
Mode Description & Best Practice

INSERT

Faster, recommended when the target table is empty. Each record is written as a new row. No existing data is checked or merged. Use this for initial loads or when you are certain there is no overlap with existing data.

UPSERT

Slower, but safer when the target table may already contain data. Each incoming record is either inserted as new or merged with an existing row if a match is found. Use this when you need to preserve and update existing records.

For best performance, use INSERT when possible. Use UPSERT only if you expect existing data in the target.

Enhancing write performance with snapshot writing concurrency

Starting from Gluesync v2.0.10, you can dramatically improve snapshot performance by configuring the Snapshot writing concurrency parameter. This controls the number of parallel threads used to write data during a snapshot task.

  • Default: 1 (no parallelism)

  • Maximum: 8 (high parallelism)

By increasing concurrency, you can achieve up to 4x faster snapshot completion times, as demonstrated in recent benchmarks and customer feedback.

Increasing concurrency can put more load on your target database. Monitor system resources and adjust the setting according to your environment’s capacity.

How to configure

You can set the Snapshot writing concurrency in the entity configuration wizard:

Snapshot writing concurrency setting
  • For large datasets or high-performance targets, set a higher value (up to 8).

  • For limited resources or when targeting smaller databases, use a lower value.

Scheduling snapshot tasks

Snapshot tasks can be executed on-demand or scheduled to run automatically at specific times using the Chronos Scheduler. Scheduling snapshots is particularly useful for:

  • Regular data refreshes from source to target

  • Off-hours data migrations to minimize impact on production systems

  • Periodic full data synchronization to ensure data consistency

To set up scheduled snapshots, navigate to the Chronos Scheduler module, create a new schedule, and select "Run Snapshot" as the action type. For detailed instructions, see the Chronos Scheduler documentation.

Boosting read performance using logical partitioning for large tables

For very large source tables, you can improve snapshot performance by enabling logical partitioning on the read side.

Logical partitioning splits the snapshot read into multiple logical ranges over a single source column and processes those ranges in parallel. This helps reduce total snapshot duration and makes better use of available resources when snapshotting large fact or history tables.

Snapshot logical partitioning setting

To learn how to configure and tune this feature, see Logical partitioning for snapshot tasks.

Real-world scenarios and customer insights

  • INSERT with high concurrency is ideal for initial migrations to an empty table—customers report significantly reduced snapshot times.

  • UPSERT is preferred for ongoing syncs where the target may already have data, especially when data integrity is critical.

  • Customers have found that adjusting concurrency helps balance speed and system stability, especially on cloud-managed databases.

Summary

To maximize snapshot task performance:

  • Prefer INSERT for empty targets, UPSERT for merging/updating.

  • Use the Snapshot writing concurrency parameter (v2.0.10+) to leverage multi-threaded writes.

  • Use source table partitioning and, on large tables, logical partitioning to boost read performance.

  • Monitor your system and adjust concurrency and partitioning settings for optimal throughput and reliability.