🔌Data Connectors
Overview of Data Connectors
Data connectors in PigeonsAI link external databases to PigeonsAI, integrating live data into model training to ensure the model stays up to date with all interactions.
Supported Data Connectors
PostgreSQL
MySQL
MongoDB
Direct CSV file upload
Snowflake (coming soon)
Databricks (coming soon)
AWS Redshift (coming soon)
S3
These connectors facilitate secure and efficient connections to different types of databases, each with their specific configuration settings.
Creating a Data Connector
To create a data connector, you need to provide details specific to the type of database you are connecting to. This typically includes the host address, database name, username, password, and potentially additional parameters such as port numbers or database-specific URIs.
Here is an example of creating a PostgreSQL connection:
Example output:
Creating Training Datasets
Once a data connector is established the output URI, can be used to create training datasets. Training datasets are configurations that allow you to specify exactly which data from your connected sources should be used for training your models. This includes defining which tables and columns to use, and how they map to the dataset schema expected by your model.
Here is an example of creating a training dataset using a PostgreSQL connection:
Example output:
The training set URI outputted from the code above will be used to train a model.
Important: Certain models require specific columns to be included in the dataset to function correctly, for example the VAE Recommender requires user_id
and product_id
, therefore these columns have to be mapped during the time of train set creation.
Security
We prioritize the security and privacy of your data and models, safeguarding your valuable assets with industry-leading practices and stringent compliance standards. A security documentation is available upon request, providing a comprehensive overview of our measures.
Encryption safeguarding your data at rest and during transit, single-tenant isolation.
No personally identifiable information needed for model training, data gets wiped after every training run.
Last updated