Azure Data Lake Storage Gen2 agent
Azure Data Lake Storage Gen2
Azure Data Lake Storage Gen2 is a set of capabilities dedicated to big data analytics, built on Azure Blob Storage. It combines the power of a high-performance file system with massive scale and economy to help you speed your transition to the cloud.
Gluesync offers the support to store data coming from supported data sources into Azure Data Lake Storage buckets in Parquet file format using native SDK.
The files stored in the Azure Data Lake Storage destination bucket follow the best practices including keyspace support. This means that documents are organized within a folder path structure based on the transaction type (snapshot or changes), table name, year, month, and timestamp.
Support for JSON files remains available as an optional format, allowing users to choose based on their preference. In this case, each document is grouped by the source schema and table name, with individual files named according to their primary key.
Setup Instructions
1. Create an Application (Service Principal) and Secret
-
Go to Microsoft Entra ID (formerly Azure AD) → App registrations → New registration
-
Enter a name for your application
-
Select "Accounts in this organizational directory only" (for single-tenant)
-
Click "Register"
-
In the app’s overview page, copy the following:
-
Application (client) ID
-
Directory (tenant) ID
-
-
Navigate to "Certificates & secrets" → "New client secret"
-
Add a description and select an expiration period
-
Click "Add" and immediately copy the client secret value (it will be hidden afterward)
You’ll need these values later: * Tenant ID * Client ID * Client Secret
2. Create a Storage Account (ADLS Gen2)
-
In the Azure Portal, go to "Storage accounts" → "+ Create"
-
Select your subscription and resource group
-
Enter a unique name (lowercase only)
-
Select a region
-
Performance: Standard (typical)
-
Redundancy: Choose between LRS or ZRS based on your needs
-
In the "Advanced" tab, enable "Hierarchical namespace" (required for ADLS Gen2)
-
(Optional) In the "Networking" tab, configure network settings (public endpoint is sufficient for initial setup)
-
Review and create the storage account
3. Assign RBAC Permissions to the Application
-
Open your newly created storage account
-
Go to "Access Control (IAM)" → "Add" → "Add role assignment"
-
Select the appropriate role:
-
For initial setup: "Storage Blob Data Owner" (temporary, for ACL management)
-
For production: "Storage Blob Data Contributor" (read/write access)
-
-
Under "Members", select "User, group, or service principal"
-
Click "Select members" and search for your application
-
Click "Select" and then "Review + assign"
Note: RBAC alone is not sufficient for ADLS Gen2 - POSIX ACLs must also be configured on the filesystem.
4. Create a Container and Configure ACLs
-
In your storage account, go to "Storage browser" → "Containers" → "+ Container"
-
Enter a name (e.g., "datalake") and click "Create"
-
Open the container and click "Manage ACL"
-
Add your application as a principal with the following permissions:
-
On the container root (/):
-
r-x(Read + Execute) for listing paths -
w(Write) if creating/writing is needed
-
-
-
Check "Propagate to child items"
-
(Recommended) Set the same permissions as default ACLs for new files and folders