Creating Training Datasets

Once your connection is established, you can create training datasets by specifying the relevant details of the data you wish to use. Here is an example of how to create a training dataset for a 'demo-amazon-beauty-data' using a connection:

res = client.data_connector.create_train_set(
    type='connection',
    train_set_name='demo-amazon-beauty-data',
    data_connection_uri='uri:data-connector:biraj_pigeonsai.com:ef57acca-5a2d-4855-a1a5-7dd3f46a02b6',
    table_name='amazon_beauty_data_full',
    columns_map={
        'user_id': 'UserId',
        'product_id': 'ProductId',
        'rating': 'Rating',
        'timestamp': 'Timestamp',
        'text_cols': ['ProductType']
    }
)

Example Output

 Train set creation successful: 201 Created
 Train set URI: uri:train-dataset:biraj_pigeonsai.com:ddbedb58-2a59-4272-975d-6685b233869a

The training set URI outputted from the code above will be used to train a model.

Repulling Training Datasets

In scenarios where the data in the connected database has been updated or changed, and you need to update your training dataset accordingly, PigeonsAI provides a simple method to repull the dataset. This ensures that your models are trained with the most current data available.

Here’s how you can repull a training dataset:

res = client.data_connector.revision_train_set_with_connector(
    train_set_uri='uri:train-dataset:biraj_pigeonsai.com:b1ccc16d-9b3d-4972-a4f6-4f422326110f',
)

Joining multiple tables to create a train set

Coming soon

Last updated