User Defined Functions (UDFs)

Overview

User Defined Functions (UDFs) allow you to write custom code scripts that are executed during the data replication process happening in the Core Hub. These scripts are checked and compiled by the platform, enabling you to implement custom business logic on incoming data during replication.

With UDFs, you can hook into the three main events that occur during data replication: * Insert operations * Update operations * Delete operations

This powerful feature gives you fine-grained control over how data is transformed and processed before it reaches the target destination.

Core Hub’s UDFs

Supported languages

Core Hub’s UDF feature supports writing custom scripts in multiple programming languages:

  • Java

  • Kotlin

  • JavaScript (available soon)

  • Python (available soon)

Select the language that best fits your team’s expertise and use case requirements.

How UDFs work

Event-based execution model

UDFs operate on an event-based model, with separate functions triggered for each operation type:

Event type Description

Insert

Triggered when new records are created in the source

Update

Triggered when existing records are modified in the source

Delete

Triggered when records are removed from the source

For each event type, you can implement a dedicated function that receives the relevant data, processes it according to your business requirements, and returns a modified dataset that will be propagated to the target.

Data access

When an event occurs, your UDF receives:

  • For insert operations: the values for the new record

  • For update operations: both the before and after values, allowing comparison (before values are available only on supported databases)

  • For delete operations: the values of the record being deleted

This complete access to data context enables sophisticated transformation, validation, and business logic implementation.

Implementation example

Below are simplified examples of how to implement UDFs for each operation type:

Java example

import java.util.Map;

public class UserDefinedFunctionTemplate {

    public Map<String, Object> onInsert(Map<String, Object> newValues, Map<String, Object> oldValues) {
        return null;
    }

    public Map<String, Object> onUpdate(Map<String, Object> newValues, Map<String, Object> oldValues) {
        return null;
    }

    public Map<String, Object> onDelete(Map<String, Object> newValues, Map<String, Object> oldValues) {
        return null;
    }
}
class UserDefinedFunctionTemplate {

    fun onInsert(newValues: Map<String, Any>, oldValues: Map<String, Any>): Map<String, Any>? {
        return null
    }

    fun onUpdate(newValues: Map<String, Any>, oldValues: Map<String, Any>): Map<String, Any>? {
        return null
    }

    fun onDelete(newValues: Map<String, Any>, oldValues: Map<String, Any>): Map<String, Any>? {
        return null
    }
}

Python example

# Insert operation handler
def on_insert(new_values):
    # Apply custom logic to the new values
    if "price" in new_values and new_values["price"] < 0:
        new_values["price"] = 0

    # You can add new fields or modify existing ones
    new_values["processed_timestamp"] = current_timestamp()

    # Return the modified values to be inserted in the target
    return new_values

# Update operation handler
def on_update(old_values, new_values):
    # Compare before and after states
    if "status" in old_values and "status" in new_values:
        if old_values["status"] == "pending" and new_values["status"] == "completed":
            # Add audit information
            new_values["completion_audit"] = f"Status changed at {current_timestamp()}"

    # Return the modified values to be updated in the target
    return new_values

# Delete operation handler
def on_delete(values_to_delete):
    # You can implement soft delete logic or other custom processing
    # Return None to prevent the delete from propagating to the target
    # Or return a modified record for special handling

    # Example: Convert delete to an update with "deleted" flag
    values_to_delete["is_deleted"] = True
    values_to_delete["deletion_timestamp"] = current_timestamp()

    # The return value determines what happens in the target
    return values_to_delete

JavaScript example

// Insert operation handler
function onInsert(newValues) {
    // Apply custom business logic
    if (newValues.category === 'electronics') {
        newValues.tax_rate = 0.07;
    }

    // Return the modified values for target insertion
    return newValues;
}

// Update operation handler
function onUpdate(oldValues, newValues) {
    // Compare before and after states
    if (oldValues.price !== newValues.price) {
        newValues.price_changed = true;
        newValues.previous_price = oldValues.price;
    }

    // Return the modified values for target update
    return newValues;
}

// Delete operation handler
function onDelete(valuesToDelete) {
    // Custom logic for delete operations
    // Return modified values to control what happens in the target

    // Example: Log deletion attempt but prevent actual deletion
    console.log(`Attempted to delete record with ID ${valuesToDelete.id}`);

    // Return null or empty object to cancel the deletion
    return null;
}

Return values

The return value from each UDF determines what happens to the data in the target:

  • Returning a modified map of values: The modified data will be propagated to the target

  • Returning null: The operation will be skipped entirely (useful for filtering out unwanted data)

  • Throwing an exception: The operation will be marked as failed and handled according to your error handling configuration

Security and validation

All UDF code is:

  1. Checked for security vulnerabilities

  2. Compiled before execution

  3. Run in a sandboxed environment to prevent system access

  4. Subject to execution timeouts to prevent performance issues

Best practices

When implementing UDFs, follow these best practices:

  • Keep your code efficient and focused on the specific transformation needed

  • Implement proper error handling within your functions

  • Test your UDFs thoroughly with representative data before deploying to production

  • Document your UDFs' purpose and behavior for team knowledge sharing (place comments in the code)

  • Avoid overly complex logic that could impact replication performance

  • Use version control for your UDF scripts to track changes over time (e.g., Git)

Limitations and considerations

  • UDF execution adds some processing overhead to replication

  • Complex logic may impact throughput performance

  • UDFs have access only to the current record data, not the entire dataset

  • Memory and execution time limits apply to prevent resource abuse