Merge multiple input tables for unified dataset
data prep
slug: recipe-data-prep-merge-multiple-input-tables-for-unified-dataset
Recipe: Merge multiple input tables for unified dataset
category: data loading and preparation
Problem
Data for analysis is often split across multiple sources:
- separate tables for transactions, customers, and product info
- inconsistent keys across datasets
- missing or mismatched records that prevent unified analysis
Solution
Follow these steps to create a single, unified dataset:
- load all source tables
- merge datasets using key fields
- keep only relevant columns
- resolve inconsistencies and duplicates
Step Sequence
load step -> combine step -> keep step -> calculate step
Input Datasets
transactions_raw — raw transactional data
customers_ref — customer reference table
products_ref — product reference table
- Notes: keys may not exactly match; some records may be missing in one table
Output Dataset
transactions_full — unified dataset with enriched customer and product info
- Notes: all tables merged, duplicates resolved, key fields standardized
Step-By-Step Explanation
| Step |
Purpose |
Notes |
| load step |
Load all input tables |
Supports local file, database, or API sources |
| combine step |
Merge datasets using key fields |
Join type can vary (inner, left, full) depending on requirements |
| keep step |
Keep only necessary columns |
Drop any irrelevant or redundant fields |
| calculate step |
Standardize and compute derived fields |
Example: compute total_amount per transaction, normalize customer_id |
Variations & Extensions
- Use filter step before combining to exclude invalid records
- Apply transpose step after merging if analysis requires reshaped data
- Add [step-pdv] to document column properties for the merged dataset
Concepts Demonstrated
- Data integration across sources
- Key-based merging and joins
- Sequencing multiple preparation steps
- Handling missing or mismatched records
Related Recipes
- Load and clean raw transaction data
- Reshape wide datasets into long format
Notes & Best Practices
- Verify key field consistency before merging
- Keep track of join types (inner, left, right) to avoid unintentional data loss
- Document assumptions for missing data treatment
- Always validate the merged dataset for duplicates or inconsistencies
Metadata
title: "Merge multiple input tables for unified dataset"
category: "data loading and preparation"
difficulty: "Intermediate"
tags: [data-integration, merging, preprocessing]
inputs: [transactions_raw, customers_ref, products_ref]
outputs: [transactions_full]
steps: [step-load, step-combine, step-keep, step-calculate]
author: "Tom Argiro"
last_updated: "2025-10-25"
doc_type: "recipe"