DAZL Documentation | Data Analytics A-to-Z Processing Language


Contents

lengthen

data management

slug: step-lengthen

Lengthen Step

Purpose

Transforms wide-format data into long-format by pivoting specified columns into rows, creating a more normalized data structure for certain types of analysis. Converts horizontal data arrangements into vertical ones.

When to Use

  • Convert wide data (many columns) to long format (fewer columns, more rows)
  • Prepare data for time series or longitudinal analysis
  • Normalize data for visualization tools that require long format
  • Transform survey data with multiple response columns
  • Create tidy data for statistical analysis
  • Melt data for correlation or pattern analysis
  • Reshape data for hierarchical or faceted visualizations

How It Works

  1. Takes a dataset with multiple related columns (wide format)
  2. Identifies key columns to maintain as identifiers
  3. Pivots specified columns into name-value pairs
  4. Creates a longer dataset with more rows and fewer columns
  5. Maintains data integrity while changing the structure
  6. Updates PDV metadata to reflect the new column structure

Parameters

Required

  • columns (array) - Columns to pivot into rows
  • id_cols (array) - Columns to retain as identifiers

Optional

  • names_to (string) - Name for the new column containing the original column names (default: name)
  • values_to (string) - Name for the new column containing the values (default: value)

Input Requirements

  • Dataset must contain the specified columns to pivot
  • Identifier columns should uniquely identify the non-pivoted dimensions

Transformation Details

  • Each row in the original dataset generates multiple rows in the result
  • The number of output rows = (input rows) × (number of pivoted columns)
  • Identifier columns are repeated for each new row
  • Original column names become values in the new names_to column
  • Values from the original columns populate the new values_to column

Output

Data

  • Longer dataset with original identifier columns plus two new columns
  • Original columns specified in columns parameter are removed
  • New column structure allows for easier filtering, grouping, and analysis by column name

PDV

  • Updated PDV structure reflecting the new column configuration
  • Metadata for identifier columns is preserved
  • New metadata added for the names_to and values_to columns

Extras

  • lengthen_applied - Timestamp when operation was performed
  • columns_pivoted - List of columns that were pivoted
  • records_before - Number of records in the original dataset
  • records_after - Number of records in the lengthened dataset

Example Usage

Basic Column Pivoting

lengthen:
  columns: [jan, feb, mar, apr, may, jun]
  id_cols: [product_id, category]
  names_to: month
  values_to: sales

Survey Response Transformation

lengthen:
  columns: [q1_response, q2_response, q3_response, q4_response]
  id_cols: [respondent_id, demographic_group]
  names_to: question
  values_to: response

Example Output

Input Data (Wide Format)

product_id category jan feb mar apr
P001 Electronics 1250 980 1340 1100
P002 Furniture 850 920 780 1050
P003 Appliances 650 720 940 820

Output Data (Long Format)

product_id category month sales
P001 Electronics jan 1250
P001 Electronics feb 980
P001 Electronics mar 1340
P001 Electronics apr 1100
P002 Furniture jan 850
P002 Furniture feb 920
P002 Furniture mar 780
P002 Furniture apr 1050
P003 Appliances jan 650
P003 Appliances feb 720
P003 Appliances mar 940
P003 Appliances apr 820

Analysis Advantages

  • Consistent Filtering: Filter by column name as a value
  • Grouping Flexibility: Group by original column names
  • Visualization Ready: Many chart types require long format
  • Statistical Analysis: Many statistical functions expect long-format data
  • Pattern Discovery: Easier to identify patterns across variables
  • Tidy Data Principles: Follows the "one observation per row" principle

Related Documentation

  • widen step - Convert long-format data to wide-format (opposite of lengthen)

  • transpose step - rotates an entire dataset 90 degrees (rows become columns, each column becomes a row)

  • filter step - Filter comparison results

  • calculate step - Add calculations to comparison results

  • sort step - Sort comparison results

  • group-by-step - Aggregate lengthened data by dimension