# Database Replicators

**Supported source databases:** Azure, BigQuery, SQL Server, MongoDB, MySQL, NetSuite, PostgreSQL, Redshift, Salesforce Service Cloud (via SOQL), and Snowflake (used mostly for non-matching regions). SSH connections are supported for external databases that require it. Allowlisting of Daasity servers is also supported.

**Supported destination warehouses:** Snowflake, Redshift, and BigQuery.

Each table replicator syncs data from a single source table. Aliases are not supported at the source level. Certain special data types and features — including BigQuery Star Tables and Arrays — are also unsupported. Joins are partially supported; see Important Notes on SQL Query Statement below.

Syncs run at configurable intervals using Daasity custom workflows. Note that depending on data volume, very high-frequency intervals (e.g., hourly syncs of tens of millions of rows) may not be attainable.

***

### Supported Sources and Destinations

| Source Databases                | Destination Warehouses |
| ------------------------------- | ---------------------- |
| Azure                           | Snowflake              |
| BigQuery                        | Redshift               |
| SQL Server                      | BigQuery               |
| MongoDB                         |                        |
| MySQL                           |                        |
| NetSuite                        |                        |
| PostgreSQL                      |                        |
| Redshift                        |                        |
| Salesforce Service Cloud (SOQL) |                        |
| Snowflake                       |                        |

***

### How Table Replicators Are Organized

Database replication in Daasity is organized around two entities:

* **A Data Source** — the parent entity that stores your database credentials and connection details (host, port, username, password, etc.). One Data Source represents one database connection.
* **Table Replicators** — child entities linked to a Data Source, each responsible for replicating one table from the source database into your warehouse.

**Finding your table replicators:** On the Integrations page, scroll past the API-based integrations to find the **Data Sources** section at the bottom. Each Data Source card shows how many table replicators are attached to it (e.g., "22 table replicators"). Click a Data Source card to open its detail page, where all linked table replicators are shown in a searchable, sortable table.

{% hint style="warning" %}
Database table replicators no longer appear in the "Activated Integrations" list alongside API-based integrations — they live exclusively in the Data Sources section.
{% endhint %}

***

### Creating a Table Replicator

You can create a new table replicator in two ways:

* **From the Integrations page:** Click **New Integration** in the top right → select a database type (e.g., NetSuite Database, PostgreSQL Database) → fill out the creation form.
* **From a Data Source detail page:** Click **Create Table Replicator** in the Table Replicators card. The Data Source field will be pre-filled for you.

{% hint style="info" %}
Setting up a table replicator is a single-form process. Fill in each section, then click **Create**. Once created, most configuration fields are locked — see the Editing section below.
{% endhint %}

**Step 1: Name Your Table Replicator**

Choose a descriptive name (e.g., `PRODUCTION DB - ORDERS`). This is a display label only — it has no validation against actual data, so be thoughtful with naming.

<figure><img src="https://410725593-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FamTMWiPne1v1V3L7mbuj%2Fuploads%2FmvgBzsqeM8cyQxUIfEtM%2Fimage.png?alt=media&#x26;token=5099a7cc-a117-48e8-934f-8bf1c771c7f2" alt=""><figcaption></figcaption></figure>

**Step 2: Select Your Source**

* **Data Source** — Select the Data Source that connects to the database containing your target table. If creating from a Data Source detail page, this will already be filled in.
* **Source Schema** — The schema in the source database where your target table lives. Defaults to `public`.
* **Source Tablename** — The exact table name in the source database. The system will validate existence when you click Create.

<figure><img src="https://410725593-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FamTMWiPne1v1V3L7mbuj%2Fuploads%2FBkeS3oQ0zHxl71p3KepX%2Fimage.png?alt=media&#x26;token=2bb24e35-e743-4879-89f2-aa1bac36918e" alt=""><figcaption></figcaption></figure>

**Step 3: Configure Your Destination**

* **Destination Schema** — The schema in your warehouse where replicated data will land. Will be created if it doesn't exist. Must be a valid schema name (no special characters except underscores; cannot start with a number).
* **Destination Tablename** — The warehouse table where data will be written. Will be created if it doesn't exist. The process will fail if a table with that name already exists.

<figure><img src="https://410725593-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FamTMWiPne1v1V3L7mbuj%2Fuploads%2F72kzhMztI7P9QsNYDYuQ%2Fimage.png?alt=media&#x26;token=08f46814-a119-427d-92c7-1180290fa365" alt=""><figcaption></figcaption></figure>

**Step 4: Choose a Sync Action**

Select how data is loaded into the destination table:

* **Full Data Replacement (Truncsert)** — Deletes all existing records before loading. Use when the source table is small, you don't need historical data, or you can't define a primary key.
* **Update/Insert (Upsert)** — Keeps existing records, inserts new ones, and updates records matched by the sync key. Use for incremental loads that capture updates to previously loaded records.
* **Append Only** — Keeps existing records and loads only new ones. Use for incremental loads when you don't need to capture updates.
* **Do Not Remove Duplicates** — Used for rollup tables where repeating entries need their own rows. Creates duplicate rows when applicable. NetSuite System Notes are a common example.

<figure><img src="https://410725593-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FamTMWiPne1v1V3L7mbuj%2Fuploads%2FE8bRTVyLU4UeNSZrNPel%2Fimage.png?alt=media&#x26;token=2799e269-1d67-451a-a72f-8e2e37499c52" alt=""><figcaption></figcaption></figure>

**Step 5: SQL Query Statement (Optional)**

The SQL Query Statement field filters how source data is extracted. Enter a valid SQL statement whose result set contains only columns present in the selected source table.

<figure><img src="https://410725593-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FamTMWiPne1v1V3L7mbuj%2Fuploads%2Fp3bhI0Oip2LDaYoWGcBA%2Fimage.png?alt=media&#x26;token=04763181-9d24-4c16-b6d4-160f782feb38" alt=""><figcaption></figcaption></figure>

{% hint style="warning" %}
**Important:** The SQL you write at creation time is permanent. It cannot be changed after the table replicator is created. Write and validate your query carefully before clicking Create — the Test Query feature (see below) can help you confirm it works before committing.
{% endhint %}

{% hint style="danger" %}
**Important Notes on SQL Query Statement:** *Aliases are not permitted.* Columns are validated against the source table's column set. Aliases will result in an error. If aliases are needed, create a view at the source and sync the view instead.
{% endhint %}

```sql
-- Will NOT work:
SELECT order_name AS ordern FROM orders
SELECT (order_name || order_id) AS compound_name FROM orders
SELECT (start_date + INTERVAL '3 hours') AS gmt_time FROM orders
```

{% hint style="info" icon="memo-circle-info" %}
*Joins are partially supported.* You can use joins or subqueries to filter your result set, as long as no columns from the joined table appear in the SELECT.
{% endhint %}

```sql
-- Will work (only source table columns in SELECT):
SELECT t.order_id FROM transactions t
JOIN transaction_types tt ON t.type_id = tt.id
WHERE tt.type_name = 'refund'

-- Will NOT work (includes column from joined table):
SELECT t.order_id, tt.type_name FROM transactions t
JOIN transaction_types tt ON t.type_id = tt.id
WHERE tt.type_name = 'refund'
```

{% hint style="success" %}
*Use `{{start_date}}` and `{{end_date}}` for date-scoped extractions.* This is strongly encouraged and required to load history on Update/Insert replicators. Daasity injects date values at runtime based on the workflow type.
{% endhint %}

```sql
SELECT order_id FROM orders
WHERE updated_at BETWEEN {{start_date}} AND {{end_date}}
```

Lookback window behavior:

* **Daily or Custom Workflows** — The lookback window runs from the current time back to 12:00 AM of the previous day.
* **History Workflows** — Multiple queries are executed, one per day in the selected range. Concurrency is limited to avoid exhausting available connections.

Failure to include `{{start_date}}` and `{{end_date}}` on Update/Insert replicators will result in a complete re-sync on every execution — a common cause of very long run times.

***

### **Test Query**

Before running a full sync, you can validate your SQL configuration using the Test Query feature. This is available on all database table replicator detail pages, **except MongoDB and Salesforce Service Cloud**.

To run a test query:

1. Open the table replicator detail page (click the replicator name from the Data Source detail page).
2. Scroll past the SQL Query section to find the **Test Query** card.
3. Click **Run Test Query**.
4. A dialog appears asking for any parameters used in the query (start date, end date, max ID, max date). Fill these in as needed.
5. Submit. The system connects to the source database and executes the query, showing progress ("Connecting to database…", "Executing query…").
6. Results display: row count, execution time, and column names.
7. Optionally click **Download Results (CSV)** to retrieve a sample of up to 5 rows.

{% hint style="info" icon="lightbulb" %}
Use this to catch wrong table names, missing columns, connection problems, or SQL errors before committing to a full data load.
{% endhint %}

***

### **Editing a Table Replicator**

After a table replicator is created, only the following fields can be changed:

* **Name** — The display label for the replicator.
* **Sync Action** — Full Data Replacement, Update/Insert, or Append Only.

Everything else is locked at creation:

* SQL Query Statement
* Source schema and table
* Destination schema and table
* "Do Not Remove Duplicates" setting

<figure><img src="https://410725593-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FamTMWiPne1v1V3L7mbuj%2Fuploads%2FygWVlBBiBnjciRv50TFa%2Fimage.png?alt=media&#x26;token=6d9c771f-9a6c-4e3c-9fbd-31350c401c40" alt=""><figcaption></figcaption></figure>

{% hint style="info" icon="lightbulb" %}
If you need to change the SQL, source table, or destination table, create a new table replicator. This design prevents accidental changes that could break active data pipelines.
{% endhint %}

***

### **Deleting a Table Replicator**

To delete a table replicator, open its detail page, click the **▾** dropdown arrow next to the Edit button, and select **Delete**.

<figure><img src="https://410725593-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FamTMWiPne1v1V3L7mbuj%2Fuploads%2FiuITMsRKHlwB96VLkmgZ%2Fimage.png?alt=media&#x26;token=8d2e1796-e473-4e3f-adcb-602792185036" alt=""><figcaption></figcaption></figure>

A confirmation dialog appears with three options for what happens to the destination table in your warehouse:

| Option                  | What happens                                                                                                                                                                                            |
| ----------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Keep data** (default) | The table replicator is removed. The destination table in the warehouse is left untouched — data remains accessible.                                                                                    |
| **Drop table**          | The table replicator is removed AND the destination table is permanently deleted from the warehouse.                                                                                                    |
| **Rename table**        | The table replicator is removed and the destination table is renamed with a `_deleted_` suffix and timestamp (e.g., `orders_deleted_20260325120000`). Data is preserved but clearly marked as inactive. |

<figure><img src="https://410725593-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FamTMWiPne1v1V3L7mbuj%2Fuploads%2FpPhqkw23pCs41YWnt0PM%2Fimage.png?alt=media&#x26;token=efb03883-abba-40c5-86af-4f4cdefb21d3" alt=""><figcaption></figcaption></figure>

{% hint style="success" %}
To confirm deletion, type the exact destination table name into the text field in the dialog. The Delete button remains disabled until the name matches exactly.
{% endhint %}

***

### **Troubleshooting and Recommendations**

* **Extraction duration is tied to source table size and query complexity.** Optimize your queries wherever possible. Some databases (e.g., NetSuite) struggle with date manipulation — test your queries in DataGrip or a similar tool to gauge performance before deploying.
* **Avoid unscoped Truncsert or Upsert operations on large tables.** Without `{{start_date}}` and `{{end_date}}`, a full table re-sync runs on every workflow execution. If no date field is available, consider the Max ID approach (contact Daasity support to configure).
* **`connection reset by peer` error** — Ensure the timeout settings on the source database are large enough to keep a connection open for the full duration of the extraction.
* **`killed` error** — Contact the Daasity support team.
* **BigQuery Star Tables are not directly supported.** Create a view to translate them into plain tables that the extractor can work with.
* **JSON fields are not supported.** A common workaround is `TO_JSON(field_name) AS field_name` (since `field_name` already exists in the mappings), but this is not officially supported and results are not guaranteed. These fields are stored as strings in the destination and require parsing during transformation.
* **Aliases can be used for limited transformations** as long as the alias name matches an existing column in the destination table. For example, `SELECT MD5(event_id) AS event_id` will work if `event_id` exists in the destination and data types are compatible.
* **Fewer rows in destination than source?** Check the sync key definition — duplicate sync keys cause rows to be overwritten.
