The Skyrun SDK enhances the process of configuring and launching model training jobs, tailored specifically for sequence-to-sequence context-aware transformer recommenders post-data preparation. By providing a model name and a data source identifier, users can quickly start training sessions. The SDK is designed to accommodate users at various levels of expertise, offering straightforward setup for newcomers and detailed customization options for veterans to fine-tune model performance.

Model Architecture Selection

To select your model's architecture, use the recommender.transformer.train method. This method is designed for training sequence-to-sequence, context-aware transformer models, which are highly effective in recommendation systems where understanding the sequence of user interactions is crucial for making accurate predictions.

Configuring Training Parameters

Configuring your model with the Skyrun SDK involves specifying a range of parameters that influence the training process. Essential parameters include the model name and data source identifier, but users can also adjust advanced settings to enhance model performance.

Example of Configuring Training Parameters

res = client.recommender.transformer.train(
    sequence_len=10,               # Optional
    train_num_negatives=5,         # Optional
    valid_num_negatives=50,        # Optional
    random_cut_prob=0.1,           # Optional
    replace_user_prob=0.01,        # Optional
    replace_item_prob=0.01,        # Optional
    hidden_dim=128,                # Optional
    temporal_dim=64,               # Optional
    num_proxy_item=10,             # Optional
    num_known_item=5000,           # Optional
    num_layers=2,                  # Optional
    num_heads=8,                   # Optional
    dropout_prob=0.1,              # Optional
    temperature=0.5,               # Optional
    epoch=5,                       # Optional
    every=1,                       # Optional
    patience=2,                    # Optional
    batch_size=128,                # Optional
    optimizer_algorithm='adam',    # Optional
    learning_rate=0.001,           # Optional
    beta1=0.9,                     # Optional
    beta2=0.999,                   # Optional
    weight_decay=0.01,             # Optional
    amsgrad=False                  # Optional

Parameters Description

custom_model_name (Required)

  • A unique identifier for your custom model. This name is used within your project or system for referencing and managing the model.

data_source_pri (Required)

  • The identifier for the primary data source, corresponding to the dataset or data stream used for training.


  • The length of the sequence for the model. Determines how many previous interactions to consider for recommendations.


  • The number of negative samples used during training. Helps the model learn what not to recommend.


  • The number of negative samples used during validation.


  • The probability of randomly cutting the sequence during training for data augmentation.


  • The probability of replacing the user in the sequence with a random user for data augmentation.


  • The probability of replacing an item in the sequence with a random item for data augmentation.


  • The dimensionality of the hidden state in the transformer model.


  • The dimensionality of the temporal embedding. Used if the model incorporates time information in the sequence.


  • The number of proxy items in the model. Represents groups of items semantically.


  • The number of known items in the model. These are items with explicit representations.


  • The number of layers in the transformer model.


  • The number of attention heads in the transformer model. Allows the model to focus on different parts of the sequence.


  • The dropout probability. A regularization technique to prevent overfitting.


  • The temperature parameter for the softmax function. Affects the distribution of the output probabilities.


  • The number of training epochs. One epoch is one complete pass through the entire training dataset.


  • The frequency of model evaluation during training.


  • The number of epochs to wait for improvement before stopping training.


  • The number of sequences to process at once during training.


  • The optimization algorithm used for training. Examples include 'adam', 'sgd', etc.


  • The learning rate for the optimizer. Determines how much to change the model in response to the estimated error each time the model weights are updated.

beta1, beta2

  • The exponential decay rates for the first and second moment estimates in the Adam optimizer.


  • The weight decay (L2 penalty) for the optimizer. A regularization technique to prevent overfitting.

Expected Output

{'data': {'message': 'Training started. Model endpoint will be accessible upon completion. Visit PigeonsAI web app for status updates.'}}

This message confirms that the model training has begun and provides an endpoint URL where the model will be accessible after completion. For updates on the training progress, users are encouraged to check the PigeonsAI web app.

Reach out to us if you have any questions

Founders LinkedIn: