Monitoring

Monitoring through Prometheus

Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud. It has become the de-facto standard for services monitoring.

Gluesync support and expose Prometheus-compatible metrics through /metrics endpoint as per default. The current list of exposed metrics set are the following:

Metric name	Description	Type
gluesync_processed_events_total	The total number of processed events	counter
gluesync_errors_count_total	The total number of errors	counter
gluesync_uptime	Time since start (milliseconds)	counter
gluesync_processed_events	The number of processed events per table	counter
gluesync_errors	The number of errors per table.	counter
gluesync_buffer_pressure	The buffer pressure per table as percentage	counter
gluesync_sync_status	The sync status per table. Can be 1 for busy or 0 for idle	gauge

Metric name

Description

Type

gluesync_processed_events_total

The total number of processed events

counter

gluesync_errors_count_total

The total number of errors

counter

gluesync_uptime

Time since start (milliseconds)

counter

gluesync_processed_events

The number of processed events per table

counter

gluesync_errors

The number of errors per table.

counter

gluesync_buffer_pressure

The buffer pressure per table as percentage

counter

gluesync_sync_status

The sync status per table. Can be 1 for busy or 0 for idle

gauge

Status endpoint

Gluesync expose an HTTP endpoint, by default belonging to the port 80 (unless changed by the user), which expose a set of metrics and information regarding the replication status, its configuration, service health and resources consumption.

Here following an example of its output:

{
  "version": "1.3.3",
  "source": "your source database",
  "target": "your target database",
  "startDateTime": "instance start time",
  "processedEvents": "the number of processed events since its start",
  "errorsCount": "number of errors encountered since its start",
  "entities": {
    "table A": {
        "processedEvents": "the number of processed events, per this entity, since its start",
        "errorsCount": "number of errors encountered, per this entity, since its start",
        "bufferPressure": "buffer pressure, espressed in %, per this entity",
        "state": "current state, per this entity, could be Idle or Busy"
    },
    "table B": {
        "processedEvents": "the number of processed events, per this entity, since its start",
        "errorsCount": "number of errors encountered, per this entity, since its start",
        "bufferPressure": "buffer pressure, espressed in %, per this entity",
        "state": "current state, per this entity, could be Idle or Busy"
    },
  },
}

Understanding the buffer pressure indicator

Buffer pressure is a useful indicator, in %, that points out the current pressure that the replication buffer is measuring per each entity being replicated. An higher % value indicates that you might be experiencing slow writes against your target database, resulting in higher replication times and latencies. If this occurs often we suggest to increase the resources on your target database (faster disks, …).

On the opposite side if you’re experiencing latencies while replicating but when looking at the bufferPressure indicator it appears to be a low value less that could range from 0% up to 5%, this could potentially indicate that your source database is idle or the connection between the two is slow or has latencies.