DAZL Documentation | Data Analytics A-to-Z Processing Language


Contents

freq

statistical primitive

slug: step-freq

Purpose

Generates frequency distributions for one or more columns in a dataset. Useful for understanding the distribution of categorical or discrete values, identifying dominant categories, and performing quick exploratory analysis.

When to Use

  • Summarize how often each value appears in selected fields
  • Profile datasets to find unique values and their counts
  • Prepare frequency tables for reports or dashboards
  • Validate data consistency or detect anomalies
  • Compare categorical distributions across multiple fields

How It Works

  1. Accepts an input dataset (data) along with optional metadata (pdv) and extra information (extras).
  2. Iterates through the columns specified in the parameters.
  3. For each column, computes a frequency count of unique values.
  4. Assembles a structured output containing:

    • Frequency data as an array
    • Optional HTML table for visualization
    • Metadata and extras passed through unchanged

Parameters

Required

  • columns (array) — List or map of column names to analyze. Example:

    columns:
    - region
    - status

Optional

None currently defined.

Security Features

  • Sanitizes values when rendering HTML to prevent injection.
  • Operates only on in-memory arrays — no file or SQL execution.

Input Requirements

  • Input must include a data array of associative arrays (rows).
  • Column names listed in columns must exist in each record.

Calculation Logic

For each specified column:

  1. Extracts all values from the dataset.
  2. Counts occurrences of each unique value.
  3. Returns both structured data and an HTML representation.

Example Formula (conceptual)

freq[value] = count of records where column == value

Output

Data

An associative array where each key corresponds to a column name and its value is an array of unique values and their counts.

HTML

A <table> representation for visual summaries (Bootstrap-friendly).

PDV and Extras

Passed through unchanged from input for compatibility with subsequent steps.

Output Structure

Key Description
data Frequency counts by column
pdv Metadata about columns
extras Any additional contextual data
html Rendered frequency table
outputType Set to "html"

Example Usage

freq:
  columns:
    - region
    - status

Example Output

Original Data

id region status
1 North Active
2 North Inactive
3 South Active
4 South Active
5 North Active

Frequency Data

{
  "region": {
    "North": 3,
    "South": 2
  },
  "status": {
    "Active": 3,
    "Inactive": 1
  }
}

Frequency Table (HTML)

region Frequencies
North 3
South 2
status Frequencies
Active 3
Inactive 1

Related Documentation

  • cube step – Aggregate data with summary statistics
  • calculate step – Derive new fields before frequency analysis
  • filter step – Limit records before generating frequencies
  • chart step – Plot frequency results as charts