Static Amazon AWS S3 agent

Target AWSS3 & S3-like storages

Prerequisites

To have Gluesync working on your AWS S3 & S3-like instance you will need to have:

  • An S3 bucket;

  • valid user credentials to read/write to the given bucket.

When specifically targeting AWS S3 public cloud:

  • User should be given with permissions like arn:aws:iam::aws:policy/AmazonS3FullAccess;

  • IAM access key obtained through AWS IAM console;

  • IAM secret key obtained through AWS IAM console;

  • Name of the region where the buckets belong to.

Setup via Web UI

  • Hostname / IP Address: Either the DNS record of your AWS S3 bucket resource or the address (hostname / IP address) of the S3-like cluster. If you’re connecting to an S3-like cluster please trail the hostname with the proper protocol, like: https:// or http://;

  • Port: Required, defaults to 0. For AWS S3 use 443;

  • Bucket name: Name of your target bucket;

  • Username: IAM Credential access key ID created under your AWS IAM console, username for S3-like servers;

  • Password: IAM Credential secret created under your AWS IAM console, the password for S3-like servers;

  • Tls certificates: (optional) File browser to let you upload your certificates;

  • Certificates password: (required if auth in place, defaults to NULL) The password used to lock your certificate;

  • Disable auth: (optional, defaults to false) Disable authentication over Kafka broker.

Custom properties

  • awsRegion: (optional, defaults to NULL) The name of the region where your bucket belongs, leave empty for S3-like servers;

  • timeoutSeconds: (optional, defaults to 60.) Number of seconds to set as timeout for operations involving communication with the destination;

  • parquetFileSizeThreshold: (default to 250) Specifies the maximum file size (in megabytes) of the cached data before it is uploaded in Parquet file format.

Entity custom properties

  • useJsonFile: (defaults false) Specifies the output file format for stored data. When set to true, data is written in JSON format. When set to false (default), data is written in Parquet format.

Setup via Rest APIs

Here’s an example of calling the Core Hub’s Rest API via curl to set up the connection for this Agent:

curl -X POST 'http://<gluesync-core-address>/api/v1/connections' \
  -H 'Content-Type: application/json' \
  -d '{
        "hostCredentials": {
            "connectionName": "myAgentNickName",
            "host": "https://<bucket-name>.s3.<region>.amazonaws.com",
            "port": 443,
            "databaseName": "<your-bucket-name>",
            "username": "<your-access-key-id>",
            "password": "<your-secret-access-key>",
            "customHostCredentials": {}
        }
  }'