DAZL Documentation | Data Analytics A-to-Z Processing Language


Contents

univariate

statistical primitive

slug: step-univariate

Purpose

Calculates univariate statistics for numeric columns in a dataset. This step provides descriptive statistics such as mean, standard deviation, minimum, maximum, skewness, and kurtosis, helping analysts understand the distribution and variability of individual numeric fields.

When to Use

  • Explore individual numeric fields for basic statistical properties
  • Identify outliers, range, or variability in a dataset
  • Generate summary tables for reporting or dashboards
  • Pre-analyze data before applying further transformations or visualizations

How It Works

  1. Extracts input components from the pipeline: data, pdv, and extras.
  2. Determines the columns to analyze. By default, all columns in the first row are considered.
  3. Iterates through each column:

    • Only numeric columns are processed.
    • For each numeric column, the univariate() function is applied to calculate statistics: count, sum, mean, min, max, standard deviation, skewness, and kurtosis.
  4. Returns a structured array with results for each numeric column.

Parameters

Optional

  • columns (array) — List of column names to analyze.

    • Default: all columns in the first row of the dataset.
    • Example: ["age", "income", "spend"]

Notes

  • Non-numeric columns are automatically skipped.

Input Requirements

  • Input dataset (data) must be an array of associative arrays (rows).
  • Numeric fields must contain values suitable for statistical calculations.

Output

Data

  • Array of univariate statistics for each numeric column. Each element includes:

    • column — Column name
    • n — Number of non-missing observations
    • sum — Sum of values
    • mean — Average value
    • min — Minimum value
    • max — Maximum value
    • stddev — Standard deviation
    • skew — Skewness
    • kurt — Kurtosis

PDV

  • Passed through unchanged from input.

Extras

  • Passed through unchanged from input.

Output Structure

Key Description
data Array of univariate statistics per numeric column
pdv Metadata about dataset columns (passed through)
extras Additional runtime information (passed through)
outputType "array" — Indicates structured array output

Example Usage

steps:
  - loadInline:
      data:
        - {age: 22, income: 38000, spend: 800}
        - {age: 25, income: 45000, spend: 1200}
        - {age: 29, income: 56000, spend: 1800}
      output: sampleData

  - univariate:
      dataset: sampleData
      columns: [age, income, spend]
      output: summaryStats

Example Output

[
  {
    "column": "age",
    "n": 3,
    "sum": 76,
    "mean": 25.33,
    "min": 22,
    "max": 29,
    "stddev": 3.51,
    "skew": 0,
    "kurt": -1.5
  },
  {
    "column": "income",
    "n": 3,
    "sum": 139000,
    "mean": 46333.33,
    "min": 38000,
    "max": 56000,
    "stddev": 9070.2,
    "skew": 0.5,
    "kurt": -1.2
  },
  {
    "column": "spend",
    "n": 3,
    "sum": 3800,
    "mean": 1266.67,
    "min": 800,
    "max": 1800,
    "stddev": 502.66,
    "skew": 0.4,
    "kurt": -1.3
  }
]

Related Documentation

  • freq-step – Generate frequency tables for categorical columns
  • calculate-step – Add or modify fields before univariate analysis
  • summarize-step – Aggregate data with multiple statistics
  • filter-step – Filter records before computing univariate stats