Snapshot Tasks

Snapshot tasks are a crucial part of Gluesync’s data synchronization, enabling you to migrate or initialize large datasets efficiently. This chapter explains how to maximize snapshot performance by choosing between INSERT and UPSERT operations and leveraging the new snapshot writing concurrency feature introduced in v2.0.10.

Overview

During a snapshot task, Gluesync copies the entire dataset from the source to the target. The way records are written—either as INSERTs or UPSERTs—directly impacts performance and data integrity.

INSERT vs UPSERT: Performance and Use Cases

UI options for INSERT and UPSERT during snapshot tasks

Mode	Description & Best Practice
INSERT	Faster, recommended when the target table is empty. Each record is written as a new row. No existing data is checked or merged. Use this for initial loads or when you are certain there is no overlap with existing data.
UPSERT	Slower, but safer when the target table may already contain data. Each incoming record is either inserted as new or merged with an existing row if a match is found. Use this when you need to preserve and update existing records.

Mode

Description & Best Practice

INSERT

Faster, recommended when the target table is empty. Each record is written as a new row. No existing data is checked or merged. Use this for initial loads or when you are certain there is no overlap with existing data.

UPSERT

Slower, but safer when the target table may already contain data. Each incoming record is either inserted as new or merged with an existing row if a match is found. Use this when you need to preserve and update existing records.

For best performance, use INSERT when possible. Use UPSERT only if you expect existing data in the target.

Enhancing Write Performance with Snapshot Writing Concurrency

Starting from Gluesync v2.0.10, you can dramatically improve snapshot performance by configuring the Snapshot writing concurrency parameter. This controls the number of parallel threads used to write data during a snapshot task.

Default: 1 (no parallelism)
Maximum: 8 (high parallelism)

By increasing concurrency, you can achieve up to 4x faster snapshot completion times, as demonstrated in recent benchmarks and customer feedback.

Increasing concurrency can put more load on your target database. Monitor system resources and adjust the setting according to your environment’s capacity.

How to Configure

You can set the Snapshot writing concurrency in the entity configuration wizard:

For large datasets or high-performance targets, set a higher value (up to 8).
For limited resources or when targeting smaller databases, use a lower value.

Boosting Read Performance with Source Table Partitioning

If your source database table is partitioned, Gluesync can dramatically accelerate snapshot reads by fetching partitions in parallel. This is especially beneficial for large datasets, as it divides the workload into multiple concurrent tasks, reducing the total time required to read all data.

You can enable partitioned reads during the entity setup wizard in the source entity settings:

Source entity settings for partitioned reads

Toggle the Use partitions option to enable partitioned reads.
Specify the Max concurrent partitions per snapshot to control how many partitions are read in parallel (higher values increase speed but also load on the source database).

This feature is currently supported for Oracle databases. Support for additional databases will be added in future releases.

Real-World Scenarios and Customer Insights

INSERT with high concurrency is ideal for initial migrations to an empty table—customers report significantly reduced snapshot times.
UPSERT is preferred for ongoing syncs where the target may already have data, especially when data integrity is critical.
Customers have found that adjusting concurrency helps balance speed and system stability, especially on cloud-managed databases.

Summary

To maximize snapshot task performance:

Prefer INSERT for empty targets, UPSERT for merging/updating.
Use the new Snapshot writing concurrency parameter (v2.0.10+) to leverage multi-threaded writes.
Monitor your system and adjust concurrency for optimal throughput and reliability.

For more details, see the release notes for v2.0.10.