DAZL Documentation | Data Analytics A-to-Z Processing Language


Contents

Reshape wide datasets into long format

data prep

slug: recipe-data-prep-reshape-wide-datasets-into-long-format

Recipe: Reshape wide datasets into long format

category: data loading and preparation

Problem

Some datasets store repeated measurements or attributes in wide format:

  • multiple columns for the same type of measure (e.g., sales_Jan, sales_Feb, sales_Mar)
  • difficult to analyze, summarize, or join with other datasets
  • many analytics steps require long format data for aggregation or visualization

Solution

Follow these steps to reshape the dataset from wide to long format:

  • load the wide dataset
  • transform repeated columns into key-value pairs
  • standardize new variable names and values
  • optionally calculate derived metrics for analysis

Step Sequence

load step -> lengthen step -> calculate step -> keep step

Input Datasets

  • sales_wide — dataset with columns like product_id, sales_Jan, sales_Feb, sales_Mar
  • Notes: some months may have missing values, column naming may not be consistent

Output Dataset

  • sales_long — long-format dataset with columns product_id, month, sales
  • Notes: suitable for time-series analysis, aggregation, and visualization

Step-By-Step Explanation

Step Purpose Notes
load step Load wide-format dataset Supports local file, database, or API sources
lengthen step Convert wide columns into key-value pairs Example: columns sales_Jan → variable month=Jan, value=sales
calculate step Standardize new variable names and compute derived fields Optional: create total sales per product or normalize values
keep step Retain only relevant columns for analysis Drop original wide columns after reshaping

Variations & Extensions

  • Apply filter step to remove rows with missing values before reshaping
  • Use transpose step after lengthening if analysis requires a different orientation
  • Combine with combine step to merge reshaped data with reference tables

Concepts Demonstrated

  • Data reshaping and pivoting
  • Preparing datasets for time-series or aggregated analysis
  • Sequencing multiple data transformation steps
  • Derived metrics in long format

Related Recipes

  • Load and clean raw transaction data
  • Merge multiple input tables for unified dataset

Notes & Best Practices

  • Always validate reshaped dataset to ensure no data was lost during the lengthen process
  • Standardize column names before reshaping for consistency
  • Keep a copy of the original wide dataset for reference

Metadata


title: "Reshape wide datasets into long format"
category: "data loading and preparation"
difficulty: "Intermediate"
tags: [data-prep, reshaping, lengthen, pivot]
inputs: [sales_wide]
outputs: [sales_long]
steps: [step-load, step-lengthen, step-calculate, step-keep]
author: "Tom Argiro"
last_updated: "2025-10-25"
doc_type: "recipe"