machine learning
slug: tutorial-understanding-trainmodel-parameterstrainModel ParametersThis tutorial gives a clear understanding of each trainModel parameter, how it impacts the model, and when to use it.
The trainModel step allows you to train machine learning models in your workflow. The step accepts several parameters to customize model training, control data handling, and optimize performance. This tutorial explains each parameter and why it is useful.
datasettarget"spend" or "churn"features["age", "income", "previous_purchases"]modelTypeDescription: Type of model to train. Common options include:
"linear" – Linear regression"logistic" – Logistic regression"knn" – k-nearest neighborsparams{"learning_rate": 0.01, "max_iterations": 1000}categorical'auto' | Array of column names'auto' detects categorical columns automatically.missing_values'error' | 'ignore' | 'impute'Use Case: Prevent model training errors due to missing data.
'error' → Throws an error if any missing values exist'ignore' → Skips rows with missing values'impute' → Fills missing values using a strategy (mean, median, mode)normalizetruetest_size0.2 reserves 20% for testing.random_statekdistance_metric'euclidean', 'manhattan', etc.)'euclidean'steps:
- trainModel:
dataset: customerData
target: spend
features: [age, income]
modelType: linear
params:
learning_rate: 0.01
max_iterations: 1000
categorical: auto
missing_values: impute
normalize: true
test_size: 0.2
random_state: 42
output: spendModel
Explanation:
dataset: customerData containing historical spendtarget: "spend" column to predictfeatures: "age" and "income" as predictorsmodelType: Linear regressionparams: Learning rate and max iterations for gradient descentcategorical: Auto-detect categorical columnsmissing_values: Impute missing valuesnormalize: Scale numeric featurestest_size: Reserve 20% for testingrandom_state: Ensure reproducibilityoutput: Store trained model as spendModelfeatures exist in both training and prediction datasets.params to improve model performance.test_size and random_state to evaluate model reliability.