DAZL Documentation | Data Analytics A-to-Z Processing Language


Contents

classify

business analytics

slug: step-classify

Purpose

Assigns categorical labels to dataset rows based on user-defined conditional rules. Useful for segmenting customers, scoring leads, or creating flags based on business logic.

When to Use

  • Segment customers based on RFM (Recency, Frequency, Monetary) or other scoring systems
  • Flag or categorize data based on complex business rules
  • Simplify downstream analysis by creating a categorical field from numeric or textual data

How It Works

  1. Receives a dataset (data) and applies a series of conditional rules to each row.
  2. Each rule contains a when condition and a then value:

    • The first rule whose condition evaluates to true is applied.
    • If no rule matches and an else clause is defined, that value is used.
    • If no match and no else, the output value is set to null.
  3. The evaluated value is assigned to a new or existing column (outputColumn).

Important: Conditions are evaluated sequentially, and the first match wins.

Parameters

Required

  • outputColumn (string) – Name of the column to store the classification result.
  • rules (array) – Ordered list of rules with the following structure:

    - when: "condition_expression"
    then: "label_value"
    - else: "default_label"
    • when expressions are evaluated per row.
    • else is optional and provides a default classification.

Input Requirements

  • Any dataset containing the fields referenced in the when conditions
  • Conditions can use logical operators (AND, OR, ==, >=, etc.)

Output

  • data: Original dataset with an additional column (outputColumn) containing the classification
  • pdv: Updated PDV metadata (adds outputColumn if new)
  • extras: Passed through unchanged
  • outputType: 'work'

Example Usage

steps:
  - classify:
      source: rfm_scores
      outputColumn: segment
      rules:
        - when: "rScore == 5 AND fScore == 5 AND mScore == 5"
          then: "Champions"
        - when: "rScore >= 4 AND fScore >= 4 AND mScore >= 4"
          then: "Loyal Customers"
        - when: "rScore <= 2 AND fScore <= 2"
          then: "At Risk"
        - else: "Other"

Explanation:

  • Rows with perfect RFM scores are labeled as “Champions.”
  • Rows with generally high scores are labeled “Loyal Customers.”
  • Low-scoring rows are labeled “At Risk.”
  • All other rows default to “Other.”

Notes & Best Practices

  • Ensure all fields referenced in when conditions exist in the dataset.
  • Logical operators should match the DSL’s evaluation syntax.
  • Order matters: the first matching rule is applied; place more specific rules first.
  • Combine with prior calculation or transformation steps to generate derived fields for classification.
  • Consider providing an else clause to avoid null classifications.