DAZL Documentation | Data Analytics A-to-Z Processing Language


Contents

compare

data management

slug: step-compare

Compare Step

Purpose

Compares two or more datasets to identify differences, similarities, changes, or patterns across them. Provides multiple comparison modes for different analytical needs.

When to Use

  • Identify records added or removed between versions of a dataset
  • Find matching records across multiple sources
  • Analyze field-level changes between datasets
  • Create comprehensive datasets by merging multiple sources
  • Validate data consistency across different systems
  • Perform set operations (union, intersection) on multiple datasets

How It Works

  1. Takes two or more datasets and a key field for record identification
  2. Applies the specified comparison mode to analyze relationships between datasets
  3. Utilizes the CompareArrays class to perform specialized comparison operations
  4. Returns the result dataset based on the comparison mode
  5. Tracks detailed statistics about the comparison operation

Parameters

Required

  • datasets (array) - Names of datasets to compare
  • on (string) - Key field used to identify and match records across datasets

Optional

  • mode (string) - Comparison operation to perform:
    • added - Records in second dataset not in first
    • removed - Records in first dataset not in second
    • intersect - Records common to all datasets
    • union - Records present in any dataset
    • analyze - Detailed field-level comparison between datasets
    • merge - Combine datasets with field-level resolution
  • compare_fields (array) - Specific fields to compare (for analyze and merge modes)

Input Requirements

  • All referenced datasets must exist
  • Key field must exist in all datasets being compared
  • For diff, added, removed, and analyze modes, exactly 2 datasets are required
  • For union and intersect modes, at least 2 datasets are required

Comparison Modes

Added

Returns records that exist in the second dataset but not in the first.

Removed

Returns records that exist in the first dataset but not in the second.

Intersect

Returns only records that exist in all of the specified datasets.

Union

Returns all records from all datasets, with duplicates removed based on the key field.

Analyze

Performs a detailed field-by-field comparison between two datasets, identifying:

  • Which fields have changed
  • Old and new values
  • Type of change (added, modified, removed)
  • Statistics on field change frequencies

Merge

Combines two datasets with conflict resolution, prioritizing the most recent or valid values.

Output

Data

  • Result records based on the comparison mode
  • For analyze mode, includes detailed change information

Extras

  • comparison - Metadata about the comparison operation:
    • mode - Comparison mode used
    • datasets - Names of datasets compared
    • key_field - Field used for record matching
    • stats - Statistics about the comparison results:
    • For added/removed: counts of matching and different records
    • For analyze: counts of changed fields and types of changes
    • For union/intersect: record counts by source

Example Usage

Find Added Records

compare:
  datasets:
    - customers_2023
    - customers_2024
  mode: added
  on: customer_id

Analyze Changes

compare:
  datasets:
    - inventory_before
    - inventory_after
  mode: analyze
  on: sku
  compare_fields:
    - price
    - stock_level
    - category

Example Output

Added Mode

Input Datasets

Dataset 1 (customers_2023) customer_id name status
1001 John Smith Active
1002 Mary Jones Inactive
1003 David Lee Active
Dataset 2 (customers_2024) customer_id name status
1001 John Smith Active
1003 David Lee Inactive
1004 Sarah Wilson Active

Output (New Records in 2024)

customer_id name status
1004 Sarah Wilson Active

Analyze Mode

Input Datasets

Dataset 1 (inventory_before) sku product_name price stock_level
ABC123 Widget Pro 29.99 150
DEF456 Gadget Plus 49.99 75
GHI789 Tech Tool 19.99 200
Dataset 2 (inventory_after) sku product_name price stock_level
ABC123 Widget Pro 34.99 120
DEF456 Gadget Plus 49.99 50
JKL012 Smart Device 39.99 100

Output (Change Analysis)

sku status field old_value new_value change_type
ABC123 Modified price 29.99 34.99 modified
ABC123 Modified stock_level 150 120 modified
DEF456 Modified stock_level 75 50 modified
GHI789 Removed - - - removed
JKL012 Added - - - added

Related Documentation

  • filter step - Filter comparison results
  • calculate step - Add calculations to comparison results
  • sort step - Sort comparison results