DAZL Documentation | Data Analytics A-to-Z Processing Language


Contents

useModel

machine learning

slug: step-usemodel

Purpose

Applies a previously trained machine learning model to a dataset to generate predictions. This step allows workflows to score new records, append predicted values as a new column, and integrate ML outputs directly into the data pipeline.

When to Use

  • Apply trained models to unseen or new datasets
  • Generate predictions for downstream analysis, reporting, or visualization
  • Integrate ML outputs into feature engineering or business logic pipelines
  • Evaluate model performance against test or validation datasets

How It Works

  1. Extracts input components from the pipeline: data, pdv, and extras.
  2. Merges default configuration parameters with those provided in $params.
  3. Retrieves the trained model from the workflow interpreter using the provided model reference.
  4. Uses the MachineLearning class to generate predictions for each row in the dataset.
  5. Appends the predictions as a new column in the dataset. The column name can be customized via the output_column parameter (default: prediction).
  6. Returns the dataset with predictions added, preserving original metadata and extras.

Parameters

Required

  • model (string) — Reference to the trained model within the workflow interpreter.
  • _interpreter — Internal workflow object used to fetch the trained model from the pipeline.

Optional

  • output_column (string) — Name of the column to store predictions. Default: "prediction"
  • categorical (string or array) — Columns treated as categorical; default 'auto'
  • missing_values (string) — How to handle missing values; default 'error'
  • normalize (boolean) — Whether numeric features are normalized; default true
  • test_size (float) — Reserved for compatibility; default null
  • random_state (int) — Random seed for reproducibility; default 42
  • k (int) — Number of neighbors for k-NN models; default 5
  • distance_metric (string) — Metric for distance-based models; default 'euclidean'

Input Requirements

  • Dataset (data) must be an array of associative arrays (rows).
  • Features expected by the trained model must exist in the dataset.
  • The model parameter must reference a previously trained model stored in extras['ml'].

Output

Data

  • Returns the dataset with predictions appended as a new column.

PDV

  • Passed through unchanged from input.

Extras

  • Passed through unchanged from input.
  • Does not modify the original trained model.

Output Structure

Key Description
data Original dataset with a new column containing predictions
pdv Metadata about dataset columns
extras Pipeline metadata and diagnostics
outputType "array" — Indicates structured array output

Example Usage

steps:
  - loadInline:
      data:
        - {age: 22, income: 38000, spend: 800}
        - {age: 25, income: 45000, spend: 1200}
        - {age: 29, income: 56000, spend: 1800}
      output: scoringData

  - useModel:
      dataset: scoringData
      model: trainedModel
      output_column: predictedOutcome
      _interpreter: __interpreter__

Example Output

[
  {"age":22,"income":38000,"spend":800,"predictedOutcome":1},
  {"age":25,"income":45000,"spend":1200,"predictedOutcome":0},
  {"age":29,"income":56000,"spend":1800,"predictedOutcome":1}
]

Related Documentation

  • trainModel-step – Train a model and store it in extras['ml']
  • calculate-step – Generate or transform features before scoring
  • filter-step – Subset data before applying a model
  • chart-step – Visualize predicted results