DAZL Documentation | Data Analytics A-to-Z Processing Language


Contents

widen

Data Management

slug: step-widen

Widen Step

Purpose

Transforms long-format data into wide-format by pivoting rows into columns, creating a more denormalized data structure for reporting and analysis. Converts vertical data arrangements into horizontal ones.

When to Use

  • Convert long data (many rows, fewer columns) to wide format (fewer rows, more columns)
  • Prepare data for reporting and dashboard presentation
  • Transform data for visualization tools that expect wide format
  • Create cross-tabulation or pivot table structures
  • Reshape time series data for comparative analysis
  • Consolidate repeated measurements into a single row
  • Prepare data for export to spreadsheet applications

How It Works

  1. Takes a dataset in long format (name-value pairs with identifiers)
  2. Groups the data by specified identifier columns
  3. For each group, creates new columns from unique values in the "names_from" column
  4. Populates these new columns with values from the "values_from" column
  5. Returns a wider dataset with fewer rows but more columns
  6. Updates PDV metadata to reflect the new column structure

Parameters

Required

  • id_cols (array) - Columns that uniquely identify each output row
  • names_from (string) - Column containing values to become new column names (default: name)
  • values_from (string) - Column containing values for the new columns (default: value)

Input Requirements

  • Dataset must be in long format with name-value pairs
  • The combination of identifier columns should uniquely define each output row
  • The "names_from" column should contain valid column name strings

Transformation Details

  • Multiple rows in the original dataset are combined into a single row in the result
  • The number of output columns depends on the unique values in the "names_from" column
  • Identifier columns appear once in the output for each unique combination
  • Each unique value in the "names_from" column becomes a new column
  • Values from the "values_from" column populate the cells in the new columns

Output

Data

  • Wider dataset with original identifier columns plus new columns from the pivot
  • Original "names_from" and "values_from" columns are removed
  • Each row represents a unique combination of identifier values

PDV

  • Updated PDV structure reflecting the new column configuration
  • Metadata for identifier columns is preserved
  • New metadata added for the generated columns

Extras

  • widen_applied - Timestamp when operation was performed
  • records_before - Number of records in the original dataset
  • records_after - Number of records in the widened dataset

Example Usage

Basic Widening

widen:
  names_from: month
  values_from: sales
  id_cols: [product_id, category]

Survey Response Transformation

widen:
  names_from: question
  values_from: response
  id_cols: [respondent_id, demographic_group]

Example Output

Input Data (Long Format)

product_id category month sales
P001 Electronics jan 1250
P001 Electronics feb 980
P001 Electronics mar 1340
P001 Electronics apr 1100
P002 Furniture jan 850
P002 Furniture feb 920
P002 Furniture mar 780
P002 Furniture apr 1050
P003 Appliances jan 650
P003 Appliances feb 720
P003 Appliances mar 940
P003 Appliances apr 820

Output Data (Wide Format)

product_id category jan feb mar apr
P001 Electronics 1250 980 1340 1100
P002 Furniture 850 920 780 1050
P003 Appliances 650 720 940 820

Analysis Advantages

  • Tabular Reporting: Ideal for spreadsheet-like reports
  • Cross-Comparison: Easily compare values across columns
  • Row-Based Analysis: Perform calculations across multiple measurements
  • Visualization Ready: Many chart types (like bar charts) prefer wide format
  • Data Density: Present more information in a compact format
  • Pattern Recognition: Visually identify trends across columns
  • Export Friendly: Matches expected format for many external tools

Related Documentation

  • lengthen-step - Convert wide-format data to long-format (opposite of widen)
  • calculate-step - Perform calculations on widened data
  • sort-step - Sort widened data by columns
  • chart-step - Create visualizations with widened data
  • filter-step - Filter widened data by column values