User Defined Functions (UDFs)

Overview

User Defined Functions (UDFs) allow you to write custom code scripts that are executed during the data replication process happening in the Core Hub. These scripts are checked and compiled by the platform, enabling you to implement custom business logic on incoming data during replication.

With UDFs, you can hook into the three main events that occur during data replication:

  • Insert operations

  • Update operations

  • Delete operations

This powerful feature gives you fine-grained control over how data is transformed and processed before it reaches the target destination.

Core Hub’s UDFs

How to create a UDF

To create a UDF, start by adding/editing an entity then click on the "UDF" (User Defined Functions) button available from the Fields editor tab:

Object’s browser Fields editor

Supported languages

Core Hub’s UDF feature supports writing custom scripts in multiple programming languages:

  • Java

  • Kotlin (available soon)

  • JavaScript (available soon)

  • Python (available soon)

Select the language that best fits your team’s expertise and use case requirements.

How UDFs work

Event-based execution model

UDFs operate on an event-based model, with a single function triggered for each operation type that acts as an entry point for the custom logic:

Event type Description

Insert

Triggered when new records are created in the source

Update

Triggered when existing records are modified in the source

Delete

Triggered when records are removed from the source

For each event type, you can implement your custom logic in a single function that receives the relevant data, processes it according to your business requirements, and returns a modified dataset that will be propagated to the target.

Data access

When an event occurs, your UDF receives:

  • For insert operations: the values for the new record

  • For update operations: both the before and after values, allowing comparison (before values are available only on supported databases)

  • For delete operations: the values of the record being deleted

This complete access to data context enables sophisticated transformation, validation, and business logic implementation.

Logger usage

UDFs provide a logger that can be used to log messages to the Core Hub’s log file. This can be useful for debugging and monitoring the behavior of your UDFs.

Implementation examples

Below are simplified examples of how to implement UDFs for each language:

Java example

import java.util.Map;
import kotlin.Pair;
import com.molo17.gluesync.commons.model.api.MappingFunctionOperation;

public class UDF_xyz {
    public Pair<MappingFunctionOperation, Map<String, Object>> onChange(Map<String, Object> newValues, Map<String, Object> oldValues, MappingFunctionOperation operation, Logger logger) {
        logger.info("Processing record: " + newValues);
        newValues.put("CITY", newValues.get("CITY").toString().toUpperCase()); // example of transformation of a string column to uppercase
        return new Pair<>(operation, newValues);
    }
}
import com.molo17.gluesync.commons.model.api.MappingFunctionOperation;

class UDF_xyz {
   fun onChange(newValues: Map<String, Any?>,
                oldValues: Map<String, Any?>,
                operation: MappingFunctionOperation,
                logger: Logger): Pair<MappingFunctionOperation, Map<String, Any?>?> {
       logger.info("Processing record: " + newValues)
       return operation to newValues
   }
}

Python example

def onChange(new_values, old_values, operation):
    """
    Handle all operations (insert, update, delete) in a single function

    Args:
        new_values: The new values (or None for delete)
        old_values: The old values (or None for insert)
        operation: The type of operation (insert, update, delete)

    Returns:
        A tuple containing the operation and the modified values
    """
    modified_values = new_values.copy() if new_values else None

    if operation == "insert":
        # Insert operation logic
        if "price" in modified_values and modified_values["price"] < 0:
            modified_values["price"] = 0
        modified_values["processed_timestamp"] = current_timestamp()

    elif operation == "update":
        # Update operation logic
        if "status" in old_values and "status" in modified_values:
            if old_values["status"] == "pending" and modified_values["status"] == "completed":
                modified_values["completion_audit"] = f"Status changed at {current_timestamp()}"

    elif operation == "delete":
        # Delete operation logic
        # Convert delete to soft delete
        modified_values["is_deleted"] = True
        modified_values["deletion_timestamp"] = current_timestamp()

    return new Pair<>(operation, modified_values);

JavaScript example

function onChange(newValues, oldValues, operation) {
    // Handle all operations in a single function
    let modifiedValues = { ...newValues };

    switch(operation) {
        case 'insert':
            // Insert operation logic
            if (newValues.category === 'electronics') {
                modifiedValues.tax_rate = 0.07;
            }
            break;

        case 'update':
            // Update operation logic
            if (oldValues && oldValues.price !== newValues.price) {
                modifiedValues.price_changed = true;
                modifiedValues.previous_price = oldValues.price;
            }
            break;

        case 'delete':
            // Delete operation logic
            // Convert to soft delete
            modifiedValues.is_deleted = true;
            modifiedValues.deletion_timestamp = new Date().toISOString();
            break;
    }

    return new Pair<>(operation, modifiedValues);
}

Return values

The return value from each UDF determines what happens to the data in the target:

  • Returning a modified map of values: The modified data will be propagated to the target

  • Returning null: The operation will be skipped entirely (useful for filtering out unwanted data)

  • Throwing an exception: The operation will be marked as failed and handled according to your error handling configuration

Security and validation

All UDF code is:

  1. Checked for security vulnerabilities

  2. Compiled before execution

  3. Run in a sandboxed environment to prevent system access

  4. Subject to execution timeouts to prevent performance issues

Best practices

When implementing UDFs, follow these best practices:

  • Keep your code efficient and focused on the specific transformation needed

  • Implement proper error handling within your functions

  • Test your UDFs thoroughly with representative data before deploying to production

  • Document your UDFs' purpose and behavior for team knowledge sharing (place comments in the code)

  • Avoid overly complex logic that could impact replication performance

  • Use version control for your UDF scripts to track changes over time (e.g., Git)

Limitations and considerations

  • UDF execution adds some processing overhead to replication

  • Complex logic may impact throughput performance

  • UDFs have access only to the current record data, not the entire dataset

  • Memory and execution time limits apply to prevent resource abuse