DAZL Documentation | Data Analytics A-to-Z Processing Language


Contents

Merge multiple input tables for unified dataset

data prep

slug: recipe-data-prep-merge-multiple-input-tables-for-unified-dataset

Recipe: Merge multiple input tables for unified dataset

category: data loading and preparation

Problem

Data for analysis is often split across multiple sources:

  • separate tables for transactions, customers, and product info
  • inconsistent keys across datasets
  • missing or mismatched records that prevent unified analysis

Solution

Follow these steps to create a single, unified dataset:

  • load all source tables
  • merge datasets using key fields
  • keep only relevant columns
  • resolve inconsistencies and duplicates

Step Sequence

load step -> combine step -> keep step -> calculate step

Input Datasets

  • transactions_raw — raw transactional data
  • customers_ref — customer reference table
  • products_ref — product reference table
  • Notes: keys may not exactly match; some records may be missing in one table

Output Dataset

  • transactions_full — unified dataset with enriched customer and product info
  • Notes: all tables merged, duplicates resolved, key fields standardized

Step-By-Step Explanation

Step Purpose Notes
load step Load all input tables Supports local file, database, or API sources
combine step Merge datasets using key fields Join type can vary (inner, left, full) depending on requirements
keep step Keep only necessary columns Drop any irrelevant or redundant fields
calculate step Standardize and compute derived fields Example: compute total_amount per transaction, normalize customer_id

Variations & Extensions

  • Use filter step before combining to exclude invalid records
  • Apply transpose step after merging if analysis requires reshaped data
  • Add [step-pdv] to document column properties for the merged dataset

Concepts Demonstrated

  • Data integration across sources
  • Key-based merging and joins
  • Sequencing multiple preparation steps
  • Handling missing or mismatched records

Related Recipes

  • Load and clean raw transaction data
  • Reshape wide datasets into long format

Notes & Best Practices

  • Verify key field consistency before merging
  • Keep track of join types (inner, left, right) to avoid unintentional data loss
  • Document assumptions for missing data treatment
  • Always validate the merged dataset for duplicates or inconsistencies

Metadata


title: "Merge multiple input tables for unified dataset"
category: "data loading and preparation"
difficulty: "Intermediate"
tags: [data-integration, merging, preprocessing]
inputs: [transactions_raw, customers_ref, products_ref]
outputs: [transactions_full]
steps: [step-load, step-combine, step-keep, step-calculate]
author: "Tom Argiro"
last_updated: "2025-10-25"
doc_type: "recipe"