DAZL Documentation | Data Analytics A-to-Z Processing Language


Contents

Detect and reconcile differences between tables

data cleanup

slug: recipe-data-cleanup-detect-and-reconcile-differences-between-tables

Recipe: Detect and reconcile differences between tables

category: data cleansing and standardization

Problem

Datasets from multiple sources often have inconsistencies:

  • records present in one table but missing in another
  • mismatched values for the same key
  • duplicate or conflicting entries that can compromise analysis

Solution

Follow these steps to identify and resolve differences:

  • load both datasets to compare
  • use comparison steps to detect missing or mismatched records
  • reconcile differences by choosing the authoritative source or applying transformation rules
  • optionally merge reconciled datasets into a single clean dataset

Step Sequence

load step -> compare step -> calculate step -> release step

Input Datasets

  • transactions_standardized — cleaned and standardized transactions
  • transactions_external — external or reference transaction dataset
  • Notes: datasets may have missing rows or inconsistent values

Output Dataset

  • transactions_reconciled — dataset with reconciled records and corrected inconsistencies
  • Notes: conflicts resolved, missing records handled, ready for downstream analysis

Step-By-Step Explanation

Step Purpose Notes
load step Load datasets to be compared Supports multiple sources including files, databases, or APIs
compare step Identify differences between tables Detect missing rows, mismatched values, duplicates
calculate step Apply transformation rules to reconcile differences Example: overwrite missing values, normalize mismatched fields
release step Output the reconciled dataset Ensures final dataset is clean and ready for analysis

Variations & Extensions

  • Use filter step to focus on specific key fields or subsets before comparing
  • Apply combine step after reconciliation to merge with additional reference tables
  • Include [step-pdv] to document column properties of the reconciled dataset

Concepts Demonstrated

  • Data comparison across sources
  • Conflict detection and resolution
  • Data reconciliation workflow
  • Sequencing multiple data management steps

Related Recipes

  • Remove invalid records and handle missing values
  • Standardize customer codes across datasets

Notes & Best Practices

  • Always backup both datasets before reconciliation
  • Validate reconciliation rules to avoid unintended data overwrites
  • Document assumptions and authoritative sources

Metadata


title: "Detect and reconcile differences between tables"
category: "data cleansing and standardization"
difficulty: "Intermediate"
tags: [data-cleaning, reconciliation, comparison]
inputs: [transactions_standardized, transactions_external]
outputs: [transactions_reconciled]
steps: [step-load, step-compare, step-calculate, step-release]
author: "Tom Argiro"
last_updated: "2025-10-25"
doc_type: "recipe"