Azure Data Lake Storage Gen2 agent

Static

Azure Data Lake Storage Gen2

Azure Data Lake Storage Gen2 is a set of capabilities dedicated to big data analytics, built on Azure Blob Storage. It combines the power of a high-performance file system with massive scale and economy to help you speed your transition to the cloud.

Prerequisites

To have Gluesync working with your Azure Data Lake Storage Gen2, you will need:

  • An active Microsoft Azure subscription

  • Sufficient permissions to create and manage Azure resources

  • Azure CLI or Azure Portal access

Setup Instructions

1. Create an Application (Service Principal) and Secret

  1. Go to Microsoft Entra ID (formerly Azure AD) → App registrations → New registration

  2. Enter a name for your application

  3. Select "Accounts in this organizational directory only" (for single-tenant)

  4. Click "Register"

  5. In the app’s overview page, copy the following:

    • Application (client) ID

    • Directory (tenant) ID

  6. Navigate to "Certificates & secrets" → "New client secret"

  7. Add a description and select an expiration period

  8. Click "Add" and immediately copy the client secret value (it will be hidden afterward)

You’ll need these values later: * Tenant ID * Client ID * Client Secret

2. Create a Storage Account (ADLS Gen2)

  1. In the Azure Portal, go to "Storage accounts" → "+ Create"

  2. Select your subscription and resource group

  3. Enter a unique name (lowercase only)

  4. Select a region

  5. Performance: Standard (typical)

  6. Redundancy: Choose between LRS or ZRS based on your needs

  7. In the "Advanced" tab, enable "Hierarchical namespace" (required for ADLS Gen2)

  8. (Optional) In the "Networking" tab, configure network settings (public endpoint is sufficient for initial setup)

  9. Review and create the storage account

3. Assign RBAC Permissions to the Application

  1. Open your newly created storage account

  2. Go to "Access Control (IAM)" → "Add" → "Add role assignment"

  3. Select the appropriate role:

    • For initial setup: "Storage Blob Data Owner" (temporary, for ACL management)

    • For production: "Storage Blob Data Contributor" (read/write access)

  4. Under "Members", select "User, group, or service principal"

  5. Click "Select members" and search for your application

  6. Click "Select" and then "Review + assign"

Note: RBAC alone is not sufficient for ADLS Gen2 - POSIX ACLs must also be configured on the filesystem.

4. Create a Container and Configure ACLs

  1. In your storage account, go to "Storage browser" → "Containers" → "+ Container"

  2. Enter a name (e.g., "datalake") and click "Create"

  3. Open the container and click "Manage ACL"

  4. Add your application as a principal with the following permissions:

    • On the container root (/):

      • r-x (Read + Execute) for listing paths

      • w (Write) if creating/writing is needed

  5. Check "Propagate to child items"

  6. (Recommended) Set the same permissions as default ACLs for new files and folders

Credentials to save

  • URL: <storage-account-name>.dfs.core.windows.net

  • Client ID: The Application (client) ID created in the setup

  • Tenant ID: The Application (tenant) ID created in the setup

  • Client Secret: The secret value created for your application

  • Container: The name of your container (e.g., "datalake")

Next Steps

Now that you’ve set up your Azure Data Lake Storage Gen2 account, you can proceed to configure it as a target in Gluesync: