🔌Data Connectors

Overview of Data Connectors

Data connectors in PigeonsAI link external databases to PigeonsAI, integrating live data into model training to ensure the model stays up to date with all interactions.

Supported Data Connectors

  • PostgreSQL

  • MySQL

  • MongoDB

  • Direct CSV file upload

  • Snowflake (coming soon)

  • Databricks (coming soon)

  • AWS Redshift (coming soon)

  • S3

These connectors facilitate secure and efficient connections to different types of databases, each with their specific configuration settings.

Creating a Data Connector

To create a data connector, you need to provide details specific to the type of database you are connecting to. This typically includes the host address, database name, username, password, and potentially additional parameters such as port numbers or database-specific URIs.

Here is an example of creating a PostgreSQL connection:

res = client.data_connector.create_connector(

Example output:

 Connector creation successful: 201 Created
 Data connector URI: uri:data-connector:biraj_pigeonsai.com:caa6f2cb-efef-43ea-93bf-2a58af5a3ae3

Creating Training Datasets

Once a data connector is established the output URI, can be used to create training datasets. Training datasets are configurations that allow you to specify exactly which data from your connected sources should be used for training your models. This includes defining which tables and columns to use, and how they map to the dataset schema expected by your model.

Here is an example of creating a training dataset using a PostgreSQL connection:

res = client.data_connector.create_train_set(
    data_connection_uri='uri:data-connector:your-connector-uri', # Output uri from create_connector
        'user_id': 'my_user_id',
        'product_id': 'my_product_id'

Example output:

 Train set creation successful: 201 Created
 Train set URI: uri:train-dataset:biraj_pigeonsai.com:ddbedb58-2a59-4272-975d-6685b233869a

The training set URI outputted from the code above will be used to train a model.

Important: Certain models require specific columns to be included in the dataset to function correctly, for example the VAE Recommender requires user_id and product_id, therefore these columns have to be mapped during the time of train set creation.


We prioritize the security and privacy of your data and models, safeguarding your valuable assets with industry-leading practices and stringent compliance standards. A security documentation is available upon request, providing a comprehensive overview of our measures.

  • Encryption safeguarding your data at rest and during transit, single-tenant isolation.

  • No personally identifiable information needed for model training, data gets wiped after every training run.

Last updated