Snapshot Tasks
Snapshot tasks are a crucial part of Gluesync’s data synchronization, enabling you to migrate or initialize large datasets efficiently. This chapter explains how to maximize snapshot performance by choosing between INSERT and UPSERT operations, tuning snapshot writing concurrency, and using logical partitioning on large source tables.
Overview
During a snapshot task, Gluesync copies the entire dataset from the source to the target. The way records are written—either as INSERTs or UPSERTs—directly impacts performance and data integrity.
TRUNCATE before snapshot
Gluesync 2.1 introduces the ability to perform a TRUNCATE operation on the target table before executing a snapshot task. This is particularly useful when you want to ensure a clean slate before loading fresh data.
Key features:
-
Clean Data Loading: Removes all existing data from the target table before the snapshot begins
-
Per-snapshot control: Choose whether to enable TRUNCATE on a per-snapshot basis
-
Agent-level safety: Can be disabled at the target agent level for security
How to use
-
Navigate to the entity’s play tab
-
Select "Run Snapshot"
-
Select your preferred write mode (INSERT/UPSERT): for INSERT mode, TRUNCATE before snapshot is enabled by default;
-
Click "Start" to start the entity
Agent-level configuration
For security reasons, TRUNCATE before snapshot can be disabled at the target agent level:
-
Go to the target agent’s configuration
-
Navigate to the "Advanced Settings" tab
-
Locate the "Disable TRUNCATE on Snapshot" parameter
-
Set to
trueto disable TRUNCATE operations (default isfalse)
| When enabled, this setting prevents any TRUNCATE operations during snapshots for all entities using this target agent, regardless of individual snapshot settings. |
| Mode | Description & Best Practice |
|---|---|
INSERT |
Faster, recommended when the target table is empty. Each record is written as a new row. No existing data is checked or merged. Use this for initial loads or when you are certain there is no overlap with existing data. |
UPSERT |
Slower, but safer when the target table may already contain data. Each incoming record is either inserted as new or merged with an existing row if a match is found. Use this when you need to preserve and update existing records. |
| For best performance, use INSERT when possible. Use UPSERT only if you expect existing data in the target. |
Enhancing write performance with snapshot writing concurrency
Starting from Gluesync v2.0.10, you can dramatically improve snapshot performance by configuring the Snapshot writing concurrency parameter. This controls the number of parallel threads used to write data during a snapshot task.
-
Default: 1 (no parallelism)
-
Maximum: 8 (high parallelism)
By increasing concurrency, you can achieve up to 4x faster snapshot completion times, as demonstrated in recent benchmarks and customer feedback.
| Increasing concurrency can put more load on your target database. Monitor system resources and adjust the setting according to your environment’s capacity. |
Scheduling snapshot tasks
Snapshot tasks can be executed on-demand or scheduled to run automatically at specific times using the Chronos Scheduler. Scheduling snapshots is particularly useful for:
-
Regular data refreshes from source to target
-
Off-hours data migrations to minimize impact on production systems
-
Periodic full data synchronization to ensure data consistency
| To set up scheduled snapshots, navigate to the Chronos Scheduler module, create a new schedule, and select "Run Snapshot" as the action type. For detailed instructions, see the Chronos Scheduler documentation. |
Boosting read performance using logical partitioning for large tables
For very large source tables, you can improve snapshot performance by enabling logical partitioning on the read side.
Logical partitioning splits the snapshot read into multiple logical ranges over a single source column and processes those ranges in parallel. This helps reduce total snapshot duration and makes better use of available resources when snapshotting large fact or history tables.
To learn how to configure and tune this feature, see Logical partitioning for snapshot tasks.
Real-world scenarios and customer insights
-
INSERT with high concurrency is ideal for initial migrations to an empty table—customers report significantly reduced snapshot times.
-
UPSERT is preferred for ongoing syncs where the target may already have data, especially when data integrity is critical.
-
Customers have found that adjusting concurrency helps balance speed and system stability, especially on cloud-managed databases.
Summary
To maximize snapshot task performance:
-
Prefer INSERT for empty targets, UPSERT for merging/updating.
-
Use the Snapshot writing concurrency parameter (v2.0.10+) to leverage multi-threaded writes.
-
Use source table partitioning and, on large tables, logical partitioning to boost read performance.
-
Monitor your system and adjust concurrency and partitioning settings for optimal throughput and reliability.