D​ata File (CSV, XLS)

You can connect Data Files to Daasity by the following the steps on this page.

Step 1: Create a Data Source

Supported Data Sources

  • Amazon S3 Buckets

  • SFTP Servers (NOT FTP)

  • Email Webhook

  • Daasity Storage

To Setup a Data Source Follow These Steps

Step 2: Setup a Data File

Daasity supports the following File Types

  • CSV

  • XLS

Click 'New Integration' in the top-right Coner of your Integration Page and Choose your File Type

A Data File is the Definition of the Data within the File as well as where the Data in the File will be Loaded.

A Data File is logically broken up into two parts (presented as 1 in the UI):

  • Source Information

  • Destination Information

  • File Information

Source Information

The "Source Information" are details about the 3rd party service that Datafile will connect to, to access and extract data from files.

The "Source Information" consists of the following:

  • Data File Source - the data source "storage space"

  • Data File Path/Name - a file / folder path to your specific files

  • Filename Date Pattern - an optional file pattern to help load data incrementally by day

Data File Source

Here you will select the Datasource that you have either just created or previously created that will be used with the Data File you're setting up.

This Datasource has the details on how to connect to the 3rd party source so we can extract the CSV/Excel files.

Data File Path/Name

The "Data File Path/Name" is the path where the file(s) are located on the Datasource.

Examples:

  • If the CSV/Excel files we want to process are located on S3 in the rocket-main/path/to folder, the path that we'd enter would be: path/to. We omit the bucket-name (rocket-main in this case) from the path.

  • If the CSV/Excel files we want to process are located in the path/to folder, the path that we'd enter would be: path/to.

  • If the CSV/Excel files we want to process are located in the "root" folder, then we can not specify a path, but should specify the file extension *.csv

File Date Pattern

The Filename Date Pattern allows you to select the date pattern for the date that is added to the CSV filename (by you or your system). Daasity uses this to parse the date from the filename.

Step 3: Buid Field Mappings

Field mappings are built automatically.

To build them automatically that means the app will open a CSV file and fill out all the columns. However, the system will NOT automatically detect the column data types.

It is up to you to choose the data type.

To Build Field Mappings from a test file on your computer choose the "Upload from your computer" button. (NOT Recommended)

To Build Field Mappings (and prove that your data source and path information is working properly) choose the Load from Data Source button. (Recommended)

The Load from Data Source option is available once a datasource has been chosen and a file path has been added, you will be able to automatically build the field mappings.

Lastly, there is a Header Row(s) Some CSV/Excel files have headers on multiple rows and this setting this to an integer value (defaults to 1) tells the parser 1) where to find all the headers and 2) where start parsing the actual data within the file.

Step 4: Setup Destination Information

The Destination Information (or what is referred to as Data Mapping) are the settings for where the data is to be loaded, and how it should be loaded.

The Destination Information is broken up into 4 fields:

  • Schema Name

  • Table Name

  • Data Load Action

  • Rollup Table

The Schema Name - The schema inside the warehouse that data will be loaded into.

The Table Name - The name of the table in the destination warehouse.

Data Load Action - Allows you to specify how you want the data loaded:

  • Update/Insert (update existing records and new records are inserted) records and

  • Full Table replacement will first truncate the table and then reload it.

Rollup Table - Select this option if you do not want to remove duplicate records. You would use this when there is No sync key and you want to Insert into the Warehouse only.

Field Mappings

Field Mappings - Maps the File Information data from the CSV/Excel file to the columns in the warehouse where it will be loaded.

The Source Field - Field Name within the CSV/Excel file.

The Destination Field - Name of the Column in the Warehouse where it will be Saved.

The Format - Data Type for the Column in the Warehouse where it will be Saved. (Daasity uses this to translate the data to the correct format.)

The Used in Sync Key is a Checkbox to select the field(s) that will be Used within the creation of the sync_key.

The Date Format (when a date/timestamp format is selected) - The expected format that the field is in, when read from the CSV/Excel file. This format is used when transforming the field to ensure the date/timestamp is parsed correctly.

Timestamps and dates can be in many different formats, making it harder to ensure it is formatting correctly.

When Files are Processed / Loaded

Email Webhooks

The email webhooks are used when Files need to be emailed to us. These files are processed immediately, when we received them.

S3 and SFTP

Files on S3 or SFTP are processed when a workflow runs and includes the integration. By default the Integration is added to the Standard Daily workflow.

Daasity Storage

Files on Daasity Storage (Daasity's SFTP server) are processed when the daily workflow is run for the account.

File Processing for both CSV & Excel

Processing Steps

These are the Steps that theFile Processing goes through for a CSV or Excel file:

  • If the file is an Excel file, it is read into memory, then converted to the CSV format

  • CSV data is then parsed

  • Then the data is formatted

  • The filename is inspected for a date added to the data as __filename_date utilizing the date pattern selected in the Filename date pattern when the integration was created.

  • Uploaded to S3 using our JSON Builder

  • Then it is queued to load

Incrementally Process/Loading

When the Data File Path/Name is set to look at a directory of 1 or more files

When the Daily Report kicks off and the data File goes to S3/SFTP to get the files to process, it looks for Files that are within the last 2 days.

The Processor uses either the date found in the filename or the date at which the file was added to the service to determine if it falls within this 2 day window.

There is NOT a current log of files successfully processed to determine what files should be loaded next. That is a feature that will come later.

When the Data File Path/Name is Set to Look at a specific File

This feature is currently NOT Supported.

All Data File Paths/Names must contain a date within the filename, and the integration must have a matching "Filename Date Pattern" that Matches the pattern used in the File Name.

For example, you may not set the integration path/name to look for "inventory.csv", instead, it must be something similar to "inventory20213101.csv" with the date pattern then selected as "YYYYDDMM

Last updated