widen

Data Management

slug: step-widen

Widen Step

Purpose

Transforms long-format data into wide-format by pivoting rows into columns, creating a more denormalized data structure for reporting and analysis. Converts vertical data arrangements into horizontal ones.

When to Use

Convert long data (many rows, fewer columns) to wide format (fewer rows, more columns)
Prepare data for reporting and dashboard presentation
Transform data for visualization tools that expect wide format
Create cross-tabulation or pivot table structures
Reshape time series data for comparative analysis
Consolidate repeated measurements into a single row
Prepare data for export to spreadsheet applications

How It Works

Takes a dataset in long format (name-value pairs with identifiers)
Groups the data by specified identifier columns
For each group, creates new columns from unique values in the "names_from" column
Populates these new columns with values from the "values_from" column
Returns a wider dataset with fewer rows but more columns
Updates PDV metadata to reflect the new column structure

Parameters

Required

id_cols (array) - Columns that uniquely identify each output row
names_from (string) - Column containing values to become new column names (default: name)
values_from (string) - Column containing values for the new columns (default: value)

Input Requirements

Dataset must be in long format with name-value pairs
The combination of identifier columns should uniquely define each output row
The "names_from" column should contain valid column name strings

Transformation Details

Multiple rows in the original dataset are combined into a single row in the result
The number of output columns depends on the unique values in the "names_from" column
Identifier columns appear once in the output for each unique combination
Each unique value in the "names_from" column becomes a new column
Values from the "values_from" column populate the cells in the new columns

Output

Data

Wider dataset with original identifier columns plus new columns from the pivot
Original "names_from" and "values_from" columns are removed
Each row represents a unique combination of identifier values

PDV

Updated PDV structure reflecting the new column configuration
Metadata for identifier columns is preserved
New metadata added for the generated columns

Extras

widen_applied - Timestamp when operation was performed
records_before - Number of records in the original dataset
records_after - Number of records in the widened dataset

Example Usage

Basic Widening

widen:
  names_from: month
  values_from: sales
  id_cols: [product_id, category]

Survey Response Transformation

widen:
  names_from: question
  values_from: response
  id_cols: [respondent_id, demographic_group]

Example Output

Input Data (Long Format)

product_id	category	month	sales
P001	Electronics	jan	1250
P001	Electronics	feb	980
P001	Electronics	mar	1340
P001	Electronics	apr	1100
P002	Furniture	jan	850
P002	Furniture	feb	920
P002	Furniture	mar	780
P002	Furniture	apr	1050
P003	Appliances	jan	650
P003	Appliances	feb	720
P003	Appliances	mar	940
P003	Appliances	apr	820

Output Data (Wide Format)

product_id	category	jan	feb	mar	apr
P001	Electronics	1250	980	1340	1100
P002	Furniture	850	920	780	1050
P003	Appliances	650	720	940	820

Analysis Advantages

Tabular Reporting: Ideal for spreadsheet-like reports
Cross-Comparison: Easily compare values across columns
Row-Based Analysis: Perform calculations across multiple measurements
Visualization Ready: Many chart types (like bar charts) prefer wide format
Data Density: Present more information in a compact format
Pattern Recognition: Visually identify trends across columns
Export Friendly: Matches expected format for many external tools

DAZL Documentation | Data Analytics A-to-Z Processing Language

Contents

Quick Index Pages (1)

Steps (34)

Recipes (24)

Topic Maps (18)

Examples (19)

Tutorials (6)

Reference (7)