A newer version of this documentation is available.
View Latest (v2.0)

Installation steps

Gluesync NoSQL to SQL for HBase

Prerequisites

In order to have Gluesync working on your HBase instance you will need to have:

  • valid user credentials with permissions of reading and writing to the source database

Basic configuration example

See it in action here in this video:

This module can be customized by using a configuration file, in JSON format. The file name to use must be specified as parameter when launching the app, with the -f or --file tokens. The file should be composed by union of common configuration file (see here Installation steps) and source/destination specific configuration, like these ones for HBase:

Source database in HBase

Despite other sources, HBase doesn’t require you to specify the source database.

Your resulting config file for that specific field will just result in an empty sourceName field:

{
  ...
  "sourceName": "",
  ...
}

Source entities in HBase

Source entities in HBase are worth to be mentioned since they are slighly different from the other supported datastores: data in HBase is organized in columns grouped by families, their representation when queried comes with a custom separator value which is user-defined.

In the example below we show how a customers table is being represented having a family c and the separator used is :.

{
  ...
  "sourceEntities": {
      "customers": {
        "mapping": {
          "c:name": "name",
          "c:surname": "surname",
          "c:address": "address",
          "c:gender": "gender",
          "c:phone": "phone",
          "c:email": "email"
        }
      },
    ...
  }
  ...
}

Above we have used the mapping feature available in Gluesync to tell the engine that when it has to take data from the column-family c:name for the table customers it has to map that field using the key name when building the JSON output.

Data types infer

Apache HBase stores data in binary format, this means that for the engine knowing the precise data type and its lenght (especially when it comes for dates, floating point numbers, …​) is a guess. In order to boost performances and avoid any unwanted behaviour while converting from binary to the destination data type we require the user to fillup this config piece in order to tell the engine how to handle the incoming data for a specific field.

This config piece should look like the one in the following example:

{
  ...
  "hbase": {
    "columnInfoSeparator": ":",
    "maxRecoveryRetry": 3,
    "entitiesDataType": {
      "customers": {
        "c:name": "STRING",
        "c:surname": "STRING",
        "c:address": "STRING",
        "c:gender": "STRING",
        "c:phone": "STRING",
        "c:email": "STRING"
      },
      ...
    }
  },
  ...
}

HBase specific configurations are listed under the hbase property:

  • columnInfoSeparator (optional): defaults to :. It’s the user-defined separator used between family and column definition;

  • maxRecoveryRetry (optional): defaults to 3. It’s the number of max retries Gluesync will attempt before hanging up the connection with the Zookeeper services on the HBase side;

  • entitiesDataType: the object (map) containing each of the tables that are listed under the sourceEntities field;

    • customers: in that example it’s our table name;

    • c:name: in that example it’s our family-separator-column name;

    • STRING: in that exmaple it’s the corresponding data type for our field.

Available supported data types:

Data type

STRING

INT

FLOAT

DOUBLE

LONG

BOOLEAN

LOCAL_DATE

LOCAL_DATE_TIME

LOCAL_TIME

OFFSET_DATE_TIME

Got Kerberos or other authentication services not covered here?

We’re working on improving the documentation and will be soon available all the possibile combinations of authentication providers commonly used within HBase.