Monitoring
Monitoring through Prometheus
Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud. It has become the de-facto standard for services monitoring.
Gluesync support and expose Prometheus-compatible metrics through /metrics
endpoint as per default.
The current list of exposed metrics set are the following:
Metric name | Description | Type |
---|---|---|
gluesync_processed_events_total |
The total number of processed events |
counter |
gluesync_errors_count_total |
The total number of errors |
counter |
gluesync_uptime |
Time since start (milliseconds) |
counter |
gluesync_processed_events |
The number of processed events per table |
counter |
gluesync_errors |
The number of errors per table. |
counter |
gluesync_buffer_pressure |
The buffer pressure per table as percentage |
counter |
gluesync_sync_status |
The sync status per table. Can be 1 for busy or 0 for idle |
gauge |
Status endpoint
Gluesync expose an HTTP endpoint, by default belonging to the port 80
(unless changed by the user), which expose a set of metrics and information regarding the replication status, its configuration, service health and resources consumption.
Here following an example of its output:
{
"version": "1.3.3",
"source": "your source database",
"target": "your target database",
"startDateTime": "instance start time",
"processedEvents": "the number of processed events since its start",
"errorsCount": "number of errors encountered since its start",
"entities": {
"table A": {
"processedEvents": "the number of processed events, per this entity, since its start",
"errorsCount": "number of errors encountered, per this entity, since its start",
"bufferPressure": "buffer pressure, espressed in %, per this entity",
"state": "current state, per this entity, could be Idle or Busy"
},
"table B": {
"processedEvents": "the number of processed events, per this entity, since its start",
"errorsCount": "number of errors encountered, per this entity, since its start",
"bufferPressure": "buffer pressure, espressed in %, per this entity",
"state": "current state, per this entity, could be Idle or Busy"
},
},
}
Understanding the buffer pressure indicator
Buffer pressure is a useful indicator, in %, that points out the current pressure that the replication buffer is measuring per each entity being replicated. An higher % value indicates that you might be experiencing slow writes against your target database, resulting in higher replication times and latencies. If this occurs often we suggest to increase the resources on your target database (faster disks, …).
On the opposite side if you’re experiencing latencies while replicating but when looking at the bufferPressure
indicator it appears to be a low value less that could range from 0% up to 5%, this could potentially indicate that your source database is idle or the connection between the two is slow or has latencies.