DAZL Documentation | Data Analytics A-to-Z Processing Language

freq

statistical primitive

slug: step-freq

Purpose

Generates frequency distributions for one or more columns in a dataset. Useful for understanding the distribution of categorical or discrete values, identifying dominant categories, and performing quick exploratory analysis.

When to Use

Summarize how often each value appears in selected fields
Profile datasets to find unique values and their counts
Prepare frequency tables for reports or dashboards
Validate data consistency or detect anomalies
Compare categorical distributions across multiple fields

How It Works

Accepts an input dataset (data) along with optional metadata (pdv) and extra information (extras).
Iterates through the columns specified in the parameters.
For each column, computes a frequency count of unique values.
Assembles a structured output containing:
- Frequency data as an array
- Optional HTML table for visualization
- Metadata and extras passed through unchanged

Parameters

Required

columns (array) — List or map of column names to analyze. Example:
```
columns:
- region
- status
```

Optional

None currently defined.

Security Features

Sanitizes values when rendering HTML to prevent injection.
Operates only on in-memory arrays — no file or SQL execution.

Input Requirements

Input must include a data array of associative arrays (rows).
Column names listed in columns must exist in each record.

Calculation Logic

For each specified column:

Extracts all values from the dataset.
Counts occurrences of each unique value.
Returns both structured data and an HTML representation.

Example Formula (conceptual)

freq[value] = count of records where column == value

Output

Data

An associative array where each key corresponds to a column name and its value is an array of unique values and their counts.

HTML

A <table> representation for visual summaries (Bootstrap-friendly).

PDV and Extras

Passed through unchanged from input for compatibility with subsequent steps.

Output Structure

Key	Description
`data`	Frequency counts by column
`pdv`	Metadata about columns
`extras`	Any additional contextual data
`html`	Rendered frequency table
`outputType`	Set to `"html"`

Example Usage

freq:
  columns:
    - region
    - status

Example Output

Original Data

id	region	status
1	North	Active
2	North	Inactive
3	South	Active
4	South	Active
5	North	Active

Frequency Data

{
  "region": {
    "North": 3,
    "South": 2
  },
  "status": {
    "Active": 3,
    "Inactive": 1
  }
}

Frequency Table (HTML)

region Frequencies
North	3
South	2

status Frequencies
Active	3
Inactive	1