DAZL Documentation | Data Analytics A-to-Z Processing Language

classify

business analytics

slug: step-classify

Purpose

Assigns categorical labels to dataset rows based on user-defined conditional rules. Useful for segmenting customers, scoring leads, or creating flags based on business logic.

When to Use

Segment customers based on RFM (Recency, Frequency, Monetary) or other scoring systems
Flag or categorize data based on complex business rules
Simplify downstream analysis by creating a categorical field from numeric or textual data

How It Works

Receives a dataset (data) and applies a series of conditional rules to each row.
Each rule contains a when condition and a then value:
- The first rule whose condition evaluates to true is applied.
- If no rule matches and an else clause is defined, that value is used.
- If no match and no else, the output value is set to null.
The evaluated value is assigned to a new or existing column (outputColumn).

Important: Conditions are evaluated sequentially, and the first match wins.

Parameters

Required

outputColumn (string) – Name of the column to store the classification result.
rules (array) – Ordered list of rules with the following structure:
```
- when: "condition_expression"
then: "label_value"
- else: "default_label"
```
- when expressions are evaluated per row.
- else is optional and provides a default classification.

Input Requirements

Any dataset containing the fields referenced in the when conditions
Conditions can use logical operators (AND, OR, ==, >=, etc.)

Output

data: Original dataset with an additional column (outputColumn) containing the classification
pdv: Updated PDV metadata (adds outputColumn if new)
extras: Passed through unchanged
outputType: 'work'

Example Usage

steps:
  - classify:
      source: rfm_scores
      outputColumn: segment
      rules:
        - when: "rScore == 5 AND fScore == 5 AND mScore == 5"
          then: "Champions"
        - when: "rScore >= 4 AND fScore >= 4 AND mScore >= 4"
          then: "Loyal Customers"
        - when: "rScore <= 2 AND fScore <= 2"
          then: "At Risk"
        - else: "Other"

Explanation:

Rows with perfect RFM scores are labeled as “Champions.”
Rows with generally high scores are labeled “Loyal Customers.”
Low-scoring rows are labeled “At Risk.”
All other rows default to “Other.”

Notes & Best Practices

Ensure all fields referenced in when conditions exist in the dataset.
Logical operators should match the DSL’s evaluation syntax.
Order matters: the first matching rule is applied; place more specific rules first.
Combine with prior calculation or transformation steps to generate derived fields for classification.
Consider providing an else clause to avoid null classifications.

DAZL Documentation | Data Analytics A-to-Z Processing Language

Contents

Quick Index Pages (1)

Steps (34)

Recipes (24)

Topic Maps (18)

Examples (18)

Tutorials (6)

Reference (7)