Monitoring

Monitoring through Prometheus

Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud. It has become the de facto standard for service monitoring.

Gluesync support and exposes Prometheus-compatible metrics through /metrics endpoint as per default. The current list of exposed metrics set is the following:

Metric name	Description	Type
gluesync_processed_events_total	The total number of processed events	counter
gluesync_errors_count_total	The total number of errors	counter
gluesync_uptime	Time since start (milliseconds)	counter
gluesync_processed_events	The number of processed events per table	counter
gluesync_errors	The number of errors per table.	counter
gluesync_buffer_pressure	The buffer pressure per table as a percentage	counter
gluesync_sync_status	The sync status per table. Can be 1 for busy or 0 for idle	gauge

Metric name

Description

Type

gluesync_processed_events_total

The total number of processed events

counter

gluesync_errors_count_total

The total number of errors

counter

gluesync_uptime

Time since start (milliseconds)

counter

gluesync_processed_events

The number of processed events per table

counter

gluesync_errors

The number of errors per table.

counter

gluesync_buffer_pressure

The buffer pressure per table as a percentage

counter

gluesync_sync_status

The sync status per table. Can be 1 for busy or 0 for idle

gauge

Status endpoint

Gluesync exposes an HTTP endpoint, by default belonging to port 80 (unless changed by the user), which exposes a set of metrics and information regarding the replication status, its configuration, service health and resource consumption.

Here is an example of its output:

{
  "version": "1.3.3",
  "source": "your source database",
  "target": "your target database",
  "startDateTime": "instance start time",
  "processedEvents": "the number of processed events since its start",
  "errorsCount": "number of errors encountered since its start",
  "entities": {
    "table A": {
        "processedEvents": "the number of processed events, per this entity, since its start",
        "errorsCount": "number of errors encountered, per this entity, since its start",
        "bufferPressure": "buffer pressure, expressed in %, per this entity",
        "state": "current state, per this entity, could be Idle or Busy"
    },
    "table B": {
        "processedEvents": "the number of processed events, per this entity, since its start",
        "errorsCount": "number of errors encountered, per this entity, since its start",
        "bufferPressure": "buffer pressure, expressed in %, per this entity",
        "state": "current state, per this entity, could be Idle or Busy"
    },
  },
}

Metrics logged as console output

Above mentioned metrics are made available also from within the console logs output (and default log file) in the following format:

You can grab those metrics by looking for INFO - Metrics log entries. They won’t come anylonger in a prettified format after the bootstrap of Gleusync to save space in logs.

INFO - Metrics: {"version":"SQL to NoSQL – version 1.5.10-beta4","source":"Oracle","target":"Aerospike","startDateTime":"2023-10-06T05:26:13.914+0000","processedEvents":0,"errorsCount":0,"entities":{"MYSCHEMA.OPEN_SEA":{"processedEvents":0,"errorsCount":0,"bufferPressure":"0%","state":"Busy"}}}

Understanding the buffer pressure indicator

Buffer pressure is a useful indicator, in %, that points out the current pressure that the replication buffer is measuring per each entity being replicated. A higher % value indicates that you might be experiencing slow writes against your target database, resulting in higher replication times and latencies. If this occurs often we suggest increasing the resources on your target database (faster disks, …).

On the opposite side if you’re experiencing latencies while replicating but when looking at the bufferPressure indicator it appears to be a low value that could range from 0% up to 5%, this could potentially indicate that your source database is idle or the connection between the two is slow or has latencies.

Stats

This set of metrics is exposed as a JSON output available at the following REST endpoint: /stats. Stats will be printed in a prettified format at the bootstrap of the instance. The stats will look like this:

Stats: {
  "os": {
    "family": "Debian GNU/Linux",
    "manufacturer": "GNU/Linux",
    "version": "11",
    "codeName": "bullseye",
    "buildNumber": "5.14.0-284.11.1.el9_2.x86_64",
    "systemUptime": 156277,
    "systemBootTime": 1696413696
  },
  "cpu": {
    "cpuVendor": "GenuineIntel",
    "cpuName": "Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz",
    "cpuFamily": "6",
    "cpuModel": "58",
    "physicalProcessorCount": 8,
    "logicalProcessorCount": 8,
    "processorCpuLoad": [
      0.64,
      0.43434343434343436,
      0.21782178217821782,
      0.12,
      0.4,
      0.19,
      0.04040404040404041,
      0.28431372549019607
    ],
    "systemCpuLoad": 0.6934673366834171,
    "interrupts": 221742579,
    "systemCpuTicks": [
      6622420,
      16530,
      1897680,
      1236170170,
      40330,
      687320,
      620270,
      0
    ]
  },
  "ram": {
    "total": 15712,
    "available": 11475,
    "pageSize": 0
  },
  "networkIfs": [
    {
      "name": "eth0",
      "iPv4address": [
        "172.17.0.2"
      ],
      "speedInMegabytes": 10000,
      "megabytesReceived": 10,
      "megabytesSent": 0
    }
  ]
}

As you can get from the given output example Gluesync provides detailed monitoring over the environment where it is currently running plus a set of details regarding the resources usage like RAM, CPUs and Network bandwidth.

Metrics logged as console output

Above mentioned metrics are made available also from within the console logs output (and default log file) in the following format:

You can grab those metrics by looking for INFO - Stats log entries. They won’t come anylonger in a prettified format after the bootstrap of Gleusync to save space in logs.

INFO - Stats: {"os":{"family":"Debian GNU/Linux","manufacturer":"GNU/Linux","version":"11","codeName":"bullseye","buildNumber":"5.14.0-284.11.1.el9_2.x86_64","systemUptime":156284,"systemBootTime":1696413696},"cpu":{"cpuVendor":"GenuineIntel","cpuName":"Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz","cpuFamily":"6","cpuModel":"58","physicalProcessorCount":8,"logicalProcessorCount":8,"processorCpuLoad":[0.4318181818181818,0.5205479452054794,0.23,0.3894736842105263,0.3564356435643564,0.32558139534883723,0.5138888888888888,0.40217391304347827],"systemCpuLoad":0.2975420439844761,"interrupts":221960632,"systemCpuTicks":[6647080,16530,1901780,1236194510,40330,687730,622400,0]},"ram":{"total":15712,"available":8963,"pageSize":0},"networkIfs":[{"name":"eth0","iPv4address":["172.17.0.2"],"speedInMegabytes":10000,"megabytesReceived":39,"megabytesSent":44}]}