DAZL Documentation | Data Analytics A-to-Z Processing Language

univariate

statistical primitive

slug: step-univariate

Purpose

Calculates univariate statistics for numeric columns in a dataset. This step provides descriptive statistics such as mean, standard deviation, minimum, maximum, skewness, and kurtosis, helping analysts understand the distribution and variability of individual numeric fields.

When to Use

Explore individual numeric fields for basic statistical properties
Identify outliers, range, or variability in a dataset
Generate summary tables for reporting or dashboards
Pre-analyze data before applying further transformations or visualizations

How It Works

Extracts input components from the pipeline: data, pdv, and extras.
Determines the columns to analyze. By default, all columns in the first row are considered.
Iterates through each column:
- Only numeric columns are processed.
- For each numeric column, the univariate() function is applied to calculate statistics: count, sum, mean, min, max, standard deviation, skewness, and kurtosis.
Returns a structured array with results for each numeric column.

Parameters

Optional

columns (array) — List of column names to analyze.
- Default: all columns in the first row of the dataset.
- Example: ["age", "income", "spend"]

Notes

Non-numeric columns are automatically skipped.

Input Requirements

Input dataset (data) must be an array of associative arrays (rows).
Numeric fields must contain values suitable for statistical calculations.

Output

Data

Array of univariate statistics for each numeric column. Each element includes:
- column — Column name
- n — Number of non-missing observations
- sum — Sum of values
- mean — Average value
- min — Minimum value
- max — Maximum value
- stddev — Standard deviation
- skew — Skewness
- kurt — Kurtosis

PDV

Passed through unchanged from input.

Extras

Passed through unchanged from input.

Output Structure

Key	Description
`data`	Array of univariate statistics per numeric column
`pdv`	Metadata about dataset columns (passed through)
`extras`	Additional runtime information (passed through)
`outputType`	`"array"` — Indicates structured array output

Example Usage

steps:
  - loadInline:
      data:
        - {age: 22, income: 38000, spend: 800}
        - {age: 25, income: 45000, spend: 1200}
        - {age: 29, income: 56000, spend: 1800}
      output: sampleData

  - univariate:
      dataset: sampleData
      columns: [age, income, spend]
      output: summaryStats

Example Output

[
  {
    "column": "age",
    "n": 3,
    "sum": 76,
    "mean": 25.33,
    "min": 22,
    "max": 29,
    "stddev": 3.51,
    "skew": 0,
    "kurt": -1.5
  },
  {
    "column": "income",
    "n": 3,
    "sum": 139000,
    "mean": 46333.33,
    "min": 38000,
    "max": 56000,
    "stddev": 9070.2,
    "skew": 0.5,
    "kurt": -1.2
  },
  {
    "column": "spend",
    "n": 3,
    "sum": 3800,
    "mean": 1266.67,
    "min": 800,
    "max": 1800,
    "stddev": 502.66,
    "skew": 0.4,
    "kurt": -1.3
  }
]

DAZL Documentation | Data Analytics A-to-Z Processing Language

Contents

Quick Index Pages (1)

Steps (34)

Recipes (24)

Topic Maps (18)

Examples (18)

Tutorials (6)

Reference (7)