lengthen

data management

slug: step-lengthen

Lengthen Step

Purpose

Transforms wide-format data into long-format by pivoting specified columns into rows, creating a more normalized data structure for certain types of analysis. Converts horizontal data arrangements into vertical ones.

When to Use

Convert wide data (many columns) to long format (fewer columns, more rows)
Prepare data for time series or longitudinal analysis
Normalize data for visualization tools that require long format
Transform survey data with multiple response columns
Create tidy data for statistical analysis
Melt data for correlation or pattern analysis
Reshape data for hierarchical or faceted visualizations

How It Works

Takes a dataset with multiple related columns (wide format)
Identifies key columns to maintain as identifiers
Pivots specified columns into name-value pairs
Creates a longer dataset with more rows and fewer columns
Maintains data integrity while changing the structure
Updates PDV metadata to reflect the new column structure

Parameters

Required

columns (array) - Columns to pivot into rows
id_cols (array) - Columns to retain as identifiers

Optional

names_to (string) - Name for the new column containing the original column names (default: name)
values_to (string) - Name for the new column containing the values (default: value)

Input Requirements

Dataset must contain the specified columns to pivot
Identifier columns should uniquely identify the non-pivoted dimensions

Transformation Details

Each row in the original dataset generates multiple rows in the result
The number of output rows = (input rows) × (number of pivoted columns)
Identifier columns are repeated for each new row
Original column names become values in the new names_to column
Values from the original columns populate the new values_to column

Output

Data

Longer dataset with original identifier columns plus two new columns
Original columns specified in columns parameter are removed
New column structure allows for easier filtering, grouping, and analysis by column name

PDV

Updated PDV structure reflecting the new column configuration
Metadata for identifier columns is preserved
New metadata added for the names_to and values_to columns

Extras

lengthen_applied - Timestamp when operation was performed
columns_pivoted - List of columns that were pivoted
records_before - Number of records in the original dataset
records_after - Number of records in the lengthened dataset

Example Usage

Basic Column Pivoting

lengthen:
  columns: [jan, feb, mar, apr, may, jun]
  id_cols: [product_id, category]
  names_to: month
  values_to: sales

Survey Response Transformation

lengthen:
  columns: [q1_response, q2_response, q3_response, q4_response]
  id_cols: [respondent_id, demographic_group]
  names_to: question
  values_to: response

Example Output

Input Data (Wide Format)

product_id	category	jan	feb	mar	apr
P001	Electronics	1250	980	1340	1100
P002	Furniture	850	920	780	1050
P003	Appliances	650	720	940	820

Output Data (Long Format)

product_id	category	month	sales
P001	Electronics	jan	1250
P001	Electronics	feb	980
P001	Electronics	mar	1340
P001	Electronics	apr	1100
P002	Furniture	jan	850
P002	Furniture	feb	920
P002	Furniture	mar	780
P002	Furniture	apr	1050
P003	Appliances	jan	650
P003	Appliances	feb	720
P003	Appliances	mar	940
P003	Appliances	apr	820

Analysis Advantages

Consistent Filtering: Filter by column name as a value
Grouping Flexibility: Group by original column names
Visualization Ready: Many chart types require long format
Statistical Analysis: Many statistical functions expect long-format data
Pattern Discovery: Easier to identify patterns across variables
Tidy Data Principles: Follows the "one observation per row" principle

DAZL Documentation | Data Analytics A-to-Z Processing Language

Contents

Quick Index Pages (1)

Steps (34)

Recipes (24)

Topic Maps (18)

Examples (19)

Tutorials (6)

Reference (7)