Detect and reconcile differences between tables
data cleanup
slug: recipe-data-cleanup-detect-and-reconcile-differences-between-tables
Recipe: Detect and reconcile differences between tables
category: data cleansing and standardization
Problem
Datasets from multiple sources often have inconsistencies:
- records present in one table but missing in another
- mismatched values for the same key
- duplicate or conflicting entries that can compromise analysis
Solution
Follow these steps to identify and resolve differences:
- load both datasets to compare
- use comparison steps to detect missing or mismatched records
- reconcile differences by choosing the authoritative source or applying transformation rules
- optionally merge reconciled datasets into a single clean dataset
Step Sequence
load step -> compare step -> calculate step -> release step
Input Datasets
transactions_standardized — cleaned and standardized transactions
transactions_external — external or reference transaction dataset
- Notes: datasets may have missing rows or inconsistent values
Output Dataset
transactions_reconciled — dataset with reconciled records and corrected inconsistencies
- Notes: conflicts resolved, missing records handled, ready for downstream analysis
Step-By-Step Explanation
| Step |
Purpose |
Notes |
| load step |
Load datasets to be compared |
Supports multiple sources including files, databases, or APIs |
| compare step |
Identify differences between tables |
Detect missing rows, mismatched values, duplicates |
| calculate step |
Apply transformation rules to reconcile differences |
Example: overwrite missing values, normalize mismatched fields |
| release step |
Output the reconciled dataset |
Ensures final dataset is clean and ready for analysis |
Variations & Extensions
- Use filter step to focus on specific key fields or subsets before comparing
- Apply combine step after reconciliation to merge with additional reference tables
- Include [step-pdv] to document column properties of the reconciled dataset
Concepts Demonstrated
- Data comparison across sources
- Conflict detection and resolution
- Data reconciliation workflow
- Sequencing multiple data management steps
Related Recipes
- Remove invalid records and handle missing values
- Standardize customer codes across datasets
Notes & Best Practices
- Always backup both datasets before reconciliation
- Validate reconciliation rules to avoid unintended data overwrites
- Document assumptions and authoritative sources
Metadata
title: "Detect and reconcile differences between tables"
category: "data cleansing and standardization"
difficulty: "Intermediate"
tags: [data-cleaning, reconciliation, comparison]
inputs: [transactions_standardized, transactions_external]
outputs: [transactions_reconciled]
steps: [step-load, step-compare, step-calculate, step-release]
author: "Tom Argiro"
last_updated: "2025-10-25"
doc_type: "recipe"