Monitoring
Monitoring through Prometheus
Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud. It has become the de facto standard for service monitoring.
Gluesync support and exposes Prometheus-compatible metrics through /metrics
endpoint as per default.
The current list of exposed metrics set is the following:
Metric name | Description | Type |
---|---|---|
gluesync_processed_events_total |
The total number of processed events |
counter |
gluesync_errors_count_total |
The total number of errors |
counter |
gluesync_uptime |
Time since start (milliseconds) |
counter |
gluesync_processed_events |
The number of processed events per table |
counter |
gluesync_errors |
The number of errors per table. |
counter |
gluesync_buffer_pressure |
The buffer pressure per table as a percentage |
counter |
gluesync_sync_status |
The sync status per table. Can be 1 for busy or 0 for idle |
gauge |
Status endpoint
Gluesync exposes an HTTP endpoint, by default belonging to port 80
(unless changed by the user), which exposes a set of metrics and information regarding the replication status, its configuration, service health and resource consumption.
Here is an example of its output:
{
"version": "1.3.3",
"source": "your source database",
"target": "your target database",
"startDateTime": "instance start time",
"processedEvents": "the number of processed events since its start",
"errorsCount": "number of errors encountered since its start",
"entities": {
"table A": {
"processedEvents": "the number of processed events, per this entity, since its start",
"errorsCount": "number of errors encountered, per this entity, since its start",
"bufferPressure": "buffer pressure, expressed in %, per this entity",
"state": "current state, per this entity, could be Idle or Busy"
},
"table B": {
"processedEvents": "the number of processed events, per this entity, since its start",
"errorsCount": "number of errors encountered, per this entity, since its start",
"bufferPressure": "buffer pressure, expressed in %, per this entity",
"state": "current state, per this entity, could be Idle or Busy"
},
},
}
Metrics logged as console output
Above mentioned metrics are made available also from within the console logs output (and default log file) in the following format:
You can grab those metrics by looking for INFO - Metrics log entries. They won’t come anylonger in a prettified format after the bootstrap of Gleusync to save space in logs.
|
INFO - Metrics: {"version":"SQL to NoSQL – version 1.5.10-beta4","source":"Oracle","target":"Aerospike","startDateTime":"2023-10-06T05:26:13.914+0000","processedEvents":0,"errorsCount":0,"entities":{"MYSCHEMA.OPEN_SEA":{"processedEvents":0,"errorsCount":0,"bufferPressure":"0%","state":"Busy"}}}
Understanding the buffer pressure indicator
Buffer pressure is a useful indicator, in %, that points out the current pressure that the replication buffer is measuring per each entity being replicated. A higher % value indicates that you might be experiencing slow writes against your target database, resulting in higher replication times and latencies. If this occurs often we suggest increasing the resources on your target database (faster disks, …).
On the opposite side if you’re experiencing latencies while replicating but when looking at the bufferPressure
indicator it appears to be a low value that could range from 0%
up to 5%
, this could potentially indicate that your source database is idle or the connection between the two is slow or has latencies.
Stats
This set of metrics is exposed as a JSON output available at the following REST endpoint: /stats
. Stats will be printed in a prettified format at the bootstrap of the instance. The stats will look like this:
Stats: {
"os": {
"family": "Debian GNU/Linux",
"manufacturer": "GNU/Linux",
"version": "11",
"codeName": "bullseye",
"buildNumber": "5.14.0-284.11.1.el9_2.x86_64",
"systemUptime": 156277,
"systemBootTime": 1696413696
},
"cpu": {
"cpuVendor": "GenuineIntel",
"cpuName": "Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz",
"cpuFamily": "6",
"cpuModel": "58",
"physicalProcessorCount": 8,
"logicalProcessorCount": 8,
"processorCpuLoad": [
0.64,
0.43434343434343436,
0.21782178217821782,
0.12,
0.4,
0.19,
0.04040404040404041,
0.28431372549019607
],
"systemCpuLoad": 0.6934673366834171,
"interrupts": 221742579,
"systemCpuTicks": [
6622420,
16530,
1897680,
1236170170,
40330,
687320,
620270,
0
]
},
"ram": {
"total": 15712,
"available": 11475,
"pageSize": 0
},
"networkIfs": [
{
"name": "eth0",
"iPv4address": [
"172.17.0.2"
],
"speedInMegabytes": 10000,
"megabytesReceived": 10,
"megabytesSent": 0
}
]
}
As you can get from the given output example Gluesync provides detailed monitoring over the environment where it is currently running plus a set of details regarding the resources usage like RAM, CPUs and Network bandwidth.
Metrics logged as console output
Above mentioned metrics are made available also from within the console logs output (and default log file) in the following format:
You can grab those metrics by looking for INFO - Stats log entries. They won’t come anylonger in a prettified format after the bootstrap of Gleusync to save space in logs.
|
INFO - Stats: {"os":{"family":"Debian GNU/Linux","manufacturer":"GNU/Linux","version":"11","codeName":"bullseye","buildNumber":"5.14.0-284.11.1.el9_2.x86_64","systemUptime":156284,"systemBootTime":1696413696},"cpu":{"cpuVendor":"GenuineIntel","cpuName":"Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz","cpuFamily":"6","cpuModel":"58","physicalProcessorCount":8,"logicalProcessorCount":8,"processorCpuLoad":[0.4318181818181818,0.5205479452054794,0.23,0.3894736842105263,0.3564356435643564,0.32558139534883723,0.5138888888888888,0.40217391304347827],"systemCpuLoad":0.2975420439844761,"interrupts":221960632,"systemCpuTicks":[6647080,16530,1901780,1236194510,40330,687730,622400,0]},"ram":{"total":15712,"available":8963,"pageSize":0},"networkIfs":[{"name":"eth0","iPv4address":["172.17.0.2"],"speedInMegabytes":10000,"megabytesReceived":39,"megabytesSent":44}]}