DAZL Documentation | Data Analytics A-to-Z Processing Language


Contents

cube

business analytics

slug: step-cube

Cube Step

Purpose

Creates multi-dimensional aggregations (data cubes) from tabular data, enabling hierarchical summaries with automatic statistics across dimensions and measures. Transforms detailed data into analytical summaries with comprehensive metrics.

When to Use

  • Create summary reports with hierarchical groupings
  • Generate cross-tabulations with multiple dimensions
  • Calculate comprehensive statistics across measures
  • Perform multi-level analysis of data patterns
  • Build OLAP-style reports with drill-down capabilities
  • Analyze data from multiple perspectives simultaneously
  • Create the foundation for dashboards and visualizations

How It Works

  1. Takes a flat dataset and configuration for dimensions and measures
  2. Builds a hierarchical structure based on the specified dimensions
  3. Calculates aggregated statistics for each specified measure
  4. Adds level indicators and frequency counts for hierarchical navigation
  5. Computes percentage distributions for counts and measures
  6. Returns a structured dataset with all aggregations and statistics
  7. Automatically configures PDV metadata for proper formatting

Parameters

Required

  • dimensions (array) - Columns to use as grouping dimensions, in hierarchical order
  • measures (array) - Numeric columns to aggregate and analyze

Metrics Generated

For each measure, the cube step automatically calculates:

  • .sum - Total sum of values
  • .mean - Average value
  • .min - Minimum value
  • .max - Maximum value
  • .pctSum - Percentage of total sum at each level
  • .stddev - Standard deviation
  • .skew - Skewness (measure of distribution asymmetry)
  • .kurt - Kurtosis (measure of distribution "tailedness")

Additionally, for each level in the hierarchy:

  • level - Hierarchy depth (0 = all data, 1 = first dimension, etc.)
  • freq - Count of records at each level
  • pctFreq - Percentage of total records at each level

Input Requirements

  • Dataset must contain the columns specified in dimensions and measures
  • Measure columns must contain numeric values for aggregation
  • Dimension columns can be any data type suitable for grouping

Output Structure

The output follows a hierarchical format:

  • Level 0: Total (grand totals for all measures)
  • Level 1: First dimension breakdown
  • Level 2: Second dimension breakdown within first
  • ...and so on for each dimension specified

PDV Enhancements

The cube step automatically configures the PDV with:

  • Proper formatting for all calculated metrics
  • Percentage formatting for proportional metrics
  • Decimal precision for statistical measures
  • Inherited formatting from original measure columns
  • Descriptive labels for all generated columns

Example Usage

cube:
  dimensions:
    - region
    - product_category
    - year
  measures:
    - revenue
    - units_sold
    - profit

Example Output

Input Data

region product_category year revenue units_sold profit
North Electronics 2023 125000 500 37500
North Furniture 2023 87500 250 21875
South Electronics 2023 98000 420 29400
South Furniture 2023 112000 300 33600
North Electronics 2024 145000 580 43500
North Furniture 2024 95000 270 28500
South Electronics 2024 118000 490 35400
South Furniture 2024 125000 340 37500

Output Data (Cube)

level region product_category year freq pctFreq revenue.sum revenue.pctSum revenue.mean revenue.stddev units_sold.sum profit.sum
0 All All All 8 100.00% 905500 100.00% 113187.50 19131.75 3150 267275
1 North All All 4 50.00% 452500 50.00% 113125.00 26541.00 1600 131375
2 North Electronics All 2 25.00% 270000 29.82% 135000.00 14142.14 1080 81000
3 North Electronics 2023 1 12.50% 125000 13.80% 125000.00 0.00 500 37500
3 North Electronics 2024 1 12.50% 145000 16.01% 145000.00 0.00 580 43500
2 North Furniture All 2 25.00% 182500 20.15% 91250.00 5303.30 520 50375
3 North Furniture 2023 1 12.50% 87500 9.66% 87500.00 0.00 250 21875
3 North Furniture 2024 1 12.50% 95000 10.49% 95000.00 0.00 270 28500
1 South All All 4 50.00% 453000 50.00% 113250.00 12019.56 1550 135900
... ... ... ... ... ... ... ... ... ... ... ...

Visualization Potential

The cube step output is ideal for:

  • Hierarchical tables with expand/collapse functionality
  • Nested charts with drill-down capabilities
  • Heatmaps showing patterns across dimensions
  • Treemaps displaying relative proportions
  • Interactive dashboards with filtering options

Related Documentation

  • filter step - Pre-filter data before cubing
  • sort step - Sort cube results by specific metrics
  • chart step - Visualize cube data
  • print step - Create formatted reports from cube data