Azure Data Lake Storage Gen2 agent
Target Azure Data Lake Storage Gen2
Before configuring Azure Data Lake Storage Gen2 as a target, make sure you have completed the setup described in the Azure Data Lake Storage Gen2 introduction.
Prerequisites
To have GlueSync working with your Azure Data Lake Storage Gen2, you will need to have:
-
A properly configured Azure Data Lake Storage Gen2 account
-
Service principal (application) with appropriate permissions
-
Storage account name and container name
Setup via Web UI
-
Host:
<storage-account-name>.dfs.core.windows.net
(this will be used as the host) -
Client ID: The Application (client) ID
-
Client Secret: The client secret for your registered application
-
Database Name: The name of your container (e.g., "datalake")
Setup via Rest APIs
Here’s an example of calling the Core Hub’s Rest API via curl to set up the connection for this Agent:
curl -X POST 'http://<gluesync-core-address>/api/v1/connections' \
-H 'Content-Type: application/json' \
-d '{
"hostCredentials": {
"connectionName": "myAgentNickName",
"host": "https://<storage-account-name>.dfs.core.windows.net",
"port": 443,
"databaseName": "<your-container-name>",
"username": "<your-client-id>",
"password": "<your-client-secret>",
"customHostCredentials": {
"tenantId": "<your-tenant-id>"
}
}
}'
Specific configuration
This agent has the following specific configuration properties:
-
URL: The Azure Data Lake Storage Gen2 endpoint (e.g.,
<storage-account-name>.dfs.core.windows.net
) -
Database: The name of the target container in your storage account
-
Username: The Application (client) ID of your registered Azure AD application
-
Password: The client secret for your registered application
-
Custom Host Credentials:
-
clientId: The Application (client) ID
-
tenantId: The Application (tenant) ID
-
clientSecret: The client secret
-