Installation steps
Basic configuration example
See it in action here in this video:
This module can be customized by using a configuration file, in JSON format.
The file name to use must be specified as parameter when launching the app, with the -f
or --file
tokens.
The file should be composed by union of common configuration file (see here Installation steps) and source/destination specific configuration, like these ones for HBase:
Source database in HBase
Despite other sources, HBase doesn’t require you to specify the source database.
Your resulting config file for that specific field will just result in an empty sourceName
field:
{
...
"sourceName": "",
...
}
Source entities in HBase
Source entities in HBase are worth to be mentioned since they are slighly different from the other supported datastores: data in HBase is organized in columns grouped by families, their representation when queried comes with a custom separator value which is user-defined.
In the example below we show how a customers table is being represented having a family c
and the separator used is :
.
{
...
"sourceEntities": {
"customers": {
"mapping": {
"c:name": "name",
"c:surname": "surname",
"c:address": "address",
"c:gender": "gender",
"c:phone": "phone",
"c:email": "email"
}
},
...
}
...
}
Above we have used the mapping feature available in Gluesync to tell the engine that when it has to take data from the column-family c:name
for the table customers
it has to map that field using the key name
when building the JSON output.
Data types infer
Apache HBase stores data in binary format, this means that for the engine knowing the precise data type and its lenght (especially when it comes for dates, floating point numbers, …) is a guess. In order to boost performances and avoid any unwanted behaviour while converting from binary to the destination data type we require the user to fillup this config piece in order to tell the engine how to handle the incoming data for a specific field.
This config piece should look like the one in the following example:
{
...
"hbase": {
"columnInfoSeparator": ":",
"maxRecoveryRetry": 3,
"entitiesDataType": {
"customers": {
"c:name": "STRING",
"c:surname": "STRING",
"c:address": "STRING",
"c:gender": "STRING",
"c:phone": "STRING",
"c:email": "STRING"
},
...
}
},
...
}
HBase specific configurations are listed under the hbase
property:
-
columnInfoSeparator (optional): defaults to
:
. It’s the user-defined separator used betweenfamily
andcolumn
definition; -
maxRecoveryRetry (optional): defaults to
3
. It’s the number of max retries Gluesync will attempt before hanging up the connection with the Zookeeper services on the HBase side; -
entitiesDataType: the object (map) containing each of the tables that are listed under the
sourceEntities
field;-
customers: in that example it’s our table name;
-
c:name: in that example it’s our
family-separator-column
name; -
STRING: in that exmaple it’s the corresponding data type for our field.
-
Available supported data types:
Data type |
---|
|
|
|
|
|
|
|
|
|
|