DAZL Documentation | Data Analytics A-to-Z Processing Language

useModel

machine learning

slug: step-usemodel

Purpose

Applies a previously trained machine learning model to a dataset to generate predictions. This step allows workflows to score new records, append predicted values as a new column, and integrate ML outputs directly into the data pipeline.

When to Use

Apply trained models to unseen or new datasets
Generate predictions for downstream analysis, reporting, or visualization
Integrate ML outputs into feature engineering or business logic pipelines
Evaluate model performance against test or validation datasets

How It Works

Extracts input components from the pipeline: data, pdv, and extras.
Merges default configuration parameters with those provided in $params.
Retrieves the trained model from the workflow interpreter using the provided model reference.
Uses the MachineLearning class to generate predictions for each row in the dataset.
Appends the predictions as a new column in the dataset. The column name can be customized via the output_column parameter (default: prediction).
Returns the dataset with predictions added, preserving original metadata and extras.

Parameters

Required

model (string) — Reference to the trained model within the workflow interpreter.
_interpreter — Internal workflow object used to fetch the trained model from the pipeline.

Optional

output_column (string) — Name of the column to store predictions. Default: "prediction"
categorical (string or array) — Columns treated as categorical; default 'auto'
missing_values (string) — How to handle missing values; default 'error'
normalize (boolean) — Whether numeric features are normalized; default true
test_size (float) — Reserved for compatibility; default null
random_state (int) — Random seed for reproducibility; default 42
k (int) — Number of neighbors for k-NN models; default 5
distance_metric (string) — Metric for distance-based models; default 'euclidean'

Input Requirements

Dataset (data) must be an array of associative arrays (rows).
Features expected by the trained model must exist in the dataset.
The model parameter must reference a previously trained model stored in extras['ml'].

Output

Data

Returns the dataset with predictions appended as a new column.

PDV

Passed through unchanged from input.

Extras

Passed through unchanged from input.
Does not modify the original trained model.

Output Structure

Key	Description
`data`	Original dataset with a new column containing predictions
`pdv`	Metadata about dataset columns
`extras`	Pipeline metadata and diagnostics
`outputType`	`"array"` — Indicates structured array output

Example Usage

steps:
  - loadInline:
      data:
        - {age: 22, income: 38000, spend: 800}
        - {age: 25, income: 45000, spend: 1200}
        - {age: 29, income: 56000, spend: 1800}
      output: scoringData

  - useModel:
      dataset: scoringData
      model: trainedModel
      output_column: predictedOutcome
      _interpreter: __interpreter__

Example Output

[
  {"age":22,"income":38000,"spend":800,"predictedOutcome":1},
  {"age":25,"income":45000,"spend":1200,"predictedOutcome":0},
  {"age":29,"income":56000,"spend":1800,"predictedOutcome":1}
]

DAZL Documentation | Data Analytics A-to-Z Processing Language

Contents

Quick Index Pages (1)

Steps (34)

Recipes (24)

Topic Maps (18)

Examples (19)

Tutorials (6)

Reference (7)