Reshape wide datasets into long format
data prep
slug: recipe-data-prep-reshape-wide-datasets-into-long-format
Recipe: Reshape wide datasets into long format
category: data loading and preparation
Problem
Some datasets store repeated measurements or attributes in wide format:
- multiple columns for the same type of measure (e.g., sales_Jan, sales_Feb, sales_Mar)
- difficult to analyze, summarize, or join with other datasets
- many analytics steps require long format data for aggregation or visualization
Solution
Follow these steps to reshape the dataset from wide to long format:
- load the wide dataset
- transform repeated columns into key-value pairs
- standardize new variable names and values
- optionally calculate derived metrics for analysis
Step Sequence
load step -> lengthen step -> calculate step -> keep step
Input Datasets
sales_wide — dataset with columns like product_id, sales_Jan, sales_Feb, sales_Mar
- Notes: some months may have missing values, column naming may not be consistent
Output Dataset
sales_long — long-format dataset with columns product_id, month, sales
- Notes: suitable for time-series analysis, aggregation, and visualization
Step-By-Step Explanation
| Step |
Purpose |
Notes |
| load step |
Load wide-format dataset |
Supports local file, database, or API sources |
| lengthen step |
Convert wide columns into key-value pairs |
Example: columns sales_Jan → variable month=Jan, value=sales |
| calculate step |
Standardize new variable names and compute derived fields |
Optional: create total sales per product or normalize values |
| keep step |
Retain only relevant columns for analysis |
Drop original wide columns after reshaping |
Variations & Extensions
- Apply filter step to remove rows with missing values before reshaping
- Use transpose step after lengthening if analysis requires a different orientation
- Combine with combine step to merge reshaped data with reference tables
Concepts Demonstrated
- Data reshaping and pivoting
- Preparing datasets for time-series or aggregated analysis
- Sequencing multiple data transformation steps
- Derived metrics in long format
Related Recipes
- Load and clean raw transaction data
- Merge multiple input tables for unified dataset
Notes & Best Practices
- Always validate reshaped dataset to ensure no data was lost during the lengthen process
- Standardize column names before reshaping for consistency
- Keep a copy of the original wide dataset for reference
Metadata
title: "Reshape wide datasets into long format"
category: "data loading and preparation"
difficulty: "Intermediate"
tags: [data-prep, reshaping, lengthen, pivot]
inputs: [sales_wide]
outputs: [sales_long]
steps: [step-load, step-lengthen, step-calculate, step-keep]
author: "Tom Argiro"
last_updated: "2025-10-25"
doc_type: "recipe"