Database Replicators
Sync external database tables into your Daasity warehouse with configurable, single-table replicators.
Supported source databases: Azure, BigQuery, SQL Server, MongoDB, MySQL, NetSuite, PostgreSQL, Redshift, Salesforce Service Cloud (via SOQL), and Snowflake (used mostly for non-matching regions). SSH connections are supported for external databases that require it. Allowlisting of Daasity servers is also supported.
Supported destination warehouses: Snowflake, Redshift, and BigQuery.
Each replicator syncs data from a single source table. Aliases are not supported at the source level. Certain special data types and features — including BigQuery Star Tables and Arrays — are also unsupported. Joins are partially supported; see Important Notes on SQL Query Statement for details.
Syncs run at configurable intervals using Daasity custom workflows. Note that depending on data volume, very high-frequency intervals (e.g., hourly syncs of tens of millions of rows) may not be attainable.
Supported Sources and Destinations
Azure
Snowflake
BigQuery
Redshift
SQL Server
BigQuery
MongoDB
MySQL
NetSuite
PostgreSQL
Redshift
Salesforce Service Cloud (SOQL)
Snowflake
Setting Up a Database Replicator
Database replication in Daasity is organized into two entities:
A Data Source — the parent entity that stores your database credentials.
One or more Database Integrations — child entities that store the metadata required to synchronize individual tables.
Creating a Database Integration
Setting up a table sync is a multi-step process. Each step collects metadata needed for the replication.
Step 1: Name Your Integration
Choose a Name for the integration. This is a display label used to identify the integration among others (e.g., PRODUCTION DB - ORDERS). It has no validation against the actual data, so be thoughtful with naming.

Step 2: Select Your Source
Configure the source of the data to be replicated:
Data Source — If your account has multiple data sources, select the one that connects to the database containing your target table.
Source Schema and Source Table — Enter the schema and table name from the source database. Ensure there are no typos and that the credentials on the data source have access to both. The system will validate existence when you click Create.

Step 3: Configure Your Destination
Define where the replicated data will land:
Destination Schema — Will be created if it doesn't exist. Must be a valid schema name (cannot start with a number, no special characters except underscores).
Destination Table — Will be created if it doesn't exist. Must be a valid table name. The process will fail if the destination table already exists.

Step 4: Choose a Sync Action
Select how data should be loaded into the destination table:
Full Data Replacement (Truncsert) — Deletes all existing records in the destination before loading. Use when the source table is small, you don't need to retain historical data, or you cannot define a primary key.

Update/Insert (Upsert) — Keeps existing records, loads new records, and updates records that already exist (matched by the sync key). Use when you want incremental loads that capture updates to previously loaded records.

Append Only — Keeps existing records and loads only new records. Use when you want incremental loads but don't need to capture updates to previously loaded records.

Do Not Remove Duplicates — Used for rollup tables where repeating entries need their own rows. This will create duplicate rows when applicable. NetSuite System Notes are a common example of a rollup table synced with this option.
Step 5: SQL Query Statement (Optional)
The SQL Query Statement field allows additional flexibility in how source data is filtered. Enter a valid SQL statement whose result set contains only columns present in the selected source table.
Important Notes on SQL Query Statement Aliases are not permitted. The requested columns are validated against the source table's column set. Aliases will result in an error. If aliases are needed, create a view at the source and sync the view instead.
Joins are partially supported. You can use joins or subqueries to filter your result set, as long as no columns from the joined table are included in the SELECT. Selecting columns not present in the source table will result in an error.
Use {{start_date}} and {{end_date}} for date-scoped extractions. This is strongly encouraged and required to load history on Update/Insert integrations. Daasity injects date values into the query at runtime based on the workflow type.
Add the variables to your WHERE clause with a date or datetime field:
Lookback window behavior:
Daily or Custom Workflows — The lookback window runs from the current time back to 12:00 AM of the previous day. If running a custom workflow on shorter intervals (e.g., hourly), note that each execution still looks back to 12 AM of the previous day, which may affect performance.
History Workflows — Multiple queries are executed, one per day in the selected range, with
start_dateandend_datepopulated for each day. Concurrency is limited by the system to avoid exhausting available connections.
Failure to include {{start_date}} and {{end_date}} on Update/Insert integrations will result in a complete re-sync of the table on every execution. This is a common issue that leads to very long run times.
If no date field is available, Full Replacement will yield slightly better performance. Alternatively, see the Advanced Options section for the Max ID and Max Date approaches.
Understanding Your Integration
Field Mappings
Every integration has a set of field mappings that describe which columns are extracted from the source and how they are transformed and mapped to the destination table. This structure enables a clear, consistent representation of source-to-destination logic and serves as the foundation for creating warehouse schemas automatically.
The system captures field mapping information automatically from the source metadata, including inferring sync key columns based on the primary key.
If the format of columns or the sync key definition needs to be changed, please contact the Daasity support team.

Sync Key
If your source table has a unique ID column, the system will select that column as the sync key. If no single unique column exists, the support team can create a compound key that combines values from multiple columns.
Daasity builds the sync key by concatenating the selected column values and applying an MD5 hash. The result is stored in the __sync_key column of each row. You can verify sync key values by running:
Verify that your sync key is truly unique before requesting a compound key.
Workflow Assignment
Once a new integration is created, it is added to the Daily Workflow by default. You can change this by editing the workflow resources or using the integration management interface.


Editing a Replicator
Once created, a database replicator can be partially edited. These constraints reduce the complexity of troubleshooting integrations that have been modified multiple times.
Can be changed:
Integration name
Sync action (Full Data Replacement, Upsert, Append Only)
SQL Query Statement — validated against the original source table structure. If the source table has been altered to add new columns not reflected in the current field mappings, adding those columns to the query will fail validation. Creating a new integration is recommended in this scenario.
Cannot be changed:
Source schema and table
Destination schema and table
"Do Not Remove Duplicates" setting
Two additional options become available in the edit form — see Advanced Options below.
Advanced Options
The options below fulfill very specific requirements and can usually be ignored. Most tables work well as standard Update/Insert or Full Replacement integrations. These options are most useful for very large tables with no date fields, where Full Replacement takes too long.
Selected Destination Table ID Field
This option avoids a full re-sync on tables where no date field is available, but an auto-incremental ID column exists.
On each extraction, the system queries the destination table for the highest ID synced so far, then issues a query to the source database passing this value into the {{max_id}} variable.
To use this feature, select an integer field in the destination table, then update your SQL query to use {{max_id}}:
Selected Destination Table Date/DateTime Field
This option enables partial re-syncs where filtering is driven by the maximum value of a date field in the destination table, rather than the workflow's lookback window.
On each extraction, the system queries the destination table for the highest date value synced so far, then issues a query to the source database passing this value into the {{max_date}} variable.
To use this feature, select a date field in the destination table, then update your SQL query to use {{max_date}}:
Example scenario: Your daily workflow uses a 24-hour lookback window. If your query uses {{start_date}} and {{end_date}}, but the source system populates updated_at values that are always 7 days behind, the workflow won't pick up those rows. Using {{max_date}} instead ensures those lagging records are captured.
Troubleshooting and Recommendations
Extraction duration is tied to source table size and query complexity. Optimize your queries wherever possible. Some databases (e.g., NetSuite) struggle with date manipulation in queries — test your queries in DataGrip or a similar tool to gauge extraction performance.
Avoid unscoped Truncsert or Upsert operations on large tables. Running these without
{{start_date}}and{{end_date}}causes a full table re-sync on every workflow execution. If your table has millions of rows, this will take a significant amount of time. If no date field is available, consider using the Max ID approach instead.connection reset by peererror — Ensure the timeout settings on the source database are large enough to keep a connection open for the duration of the extraction.killederror — Contact the Daasity support team.BigQuery Star Tables are not directly supported. Create a view to translate them into plain tables that the extractor can work with.
JSON fields are not supported. A common workaround in some databases is to use
TO_JSON(field_name) AS field_name(sincefield_nameis already in the mappings). Note that this is not officially supported and results are not guaranteed. These fields will be stored as strings in the destination and will require parsing during transformation.Aliases can be used for limited transformations as long as the alias name matches an existing column in the destination table. Note this will cause the destination data to differ from the source. For example,
SELECT MD5(event_id) AS event_idwill work ifevent_idexists in the destination table and data types are compatible.Fewer rows in destination than source? Check the sync key definition — duplicate sync keys will cause rows to be overwritten.
Last updated
Was this helpful?