machine learning
slug: step-trainmodelTrains a machine learning model using a dataset and specified features and target.
This step allows workflows to include predictive modeling, classification, or regression directly within the pipeline. The trained model is returned in the extras for downstream use, such as scoring, evaluation, or visualization.
data, pdv, and extras.$params.Uses the MachineLearning class to train the model:
target: The column to predictfeatures: List of columns used as predictorsmodelType: Type of model (e.g., logistic, linear, knn)params: Optional model hyperparameters (learning rate, regularization, etc.)extras['ml'] for downstream scoring or analysis.target (string) — Column name of the outcome variable to predict.features (array) — List of predictor columns.modelType (string) — Type of ML model to train (e.g., logistic, linear, knn).params (array) — Additional hyperparameters for the model (learning rate, regularization, etc.)categorical (string or array) — Columns to treat as categorical; default 'auto'missing_values (string) — How to handle missing values; default 'error'normalize (boolean) — Whether to normalize numeric features; default truetest_size (float) — Fraction of data reserved for testing; default nullrandom_state (int) — Random seed for reproducibility; default 42k (int) — Number of neighbors for k-NN models; default 5distance_metric (string) — Metric for distance-based models; default 'euclidean'data) must be an array of associative arrays (rows).features and target must exist and contain valid numeric or categorical values appropriate for the model type.ml — Contains the trained model object returned by the MachineLearning class.| Key | Description |
|---|---|
data |
Original dataset array |
pdv |
Metadata about dataset columns |
extras |
Contains ml with the trained model |
outputType |
"array" — Indicates structured array output |
steps:
- loadInline:
data:
- {age: 22, income: 38000, spend: 800, outcome: 1}
- {age: 25, income: 45000, spend: 1200, outcome: 0}
- {age: 29, income: 56000, spend: 1800, outcome: 1}
output: trainingData
- trainModel:
dataset: trainingData
target: outcome
features: [age, income, spend]
modelType: logistic
params:
learning_rate: 0.01
output: trainedModel
{
"data": [
{"age":22,"income":38000,"spend":800,"outcome":1},
{"age":25,"income":45000,"spend":1200,"outcome":0},
{"age":29,"income":56000,"spend":1800,"outcome":1}
],
"pdv": {},
"extras": {
"ml": {
"modelType": "logistic",
"coefficients": {"age":0.12,"income":0.0003,"spend":0.01},
"intercept": -1.23,
"training_metrics": {"accuracy":0.67,"loss":0.52}
}
},
"outputType": "array"
}