Creating Training Datasets

Creating Training Datasets

Once your connection is established, you can create training datasets by specifying the relevant details of the data you wish to use. Here is an example of how to create a training dataset for a 'demo-amazon-beauty-data' using a connection:

res = client.data_connector.create_train_set(
    type='connection',
    train_set_name='demo-amazon-beauty-data',
    data_connection_uri='uri:data-connector:biraj_pigeonsai.com:ef57acca-5a2d-4855-a1a5-7dd3f46a02b6',
    table_name='amazon_beauty_data_full',
    columns_map={
        'user_id': 'UserId',
        'product_id': 'ProductId',
        'rating': 'Rating',
        'timestamp': 'Timestamp',
        'text_cols': ['ProductType']
    }
)

Example Output

 Train set creation successful: 201 Created
 Train set URI: uri:train-dataset:biraj_pigeonsai.com:f28e919b-0c95-476f-8600-1557ce1cdc5f

Repulling Training Datasets

To keep your training datasets up-to-date with the latest data from your MongoDB database, PigeonsAI allows for easy dataset repulling. This process is vital for maintaining the accuracy and relevance of your models.

Here’s how you can repull a training dataset for MongoDB:

res = client.data_connector.revision_train_set_with_connector(
    train_set_uri='uri:train-dataset:biraj_pigeonsai.com:unique_identifier',
)

The training set URI outputted from the code above will be used to train a model.

Support for MongoDB aggregations:

Coming Soon

Last updated