Quick Index Pages (1)

Welcome to DAZL

Recipes (24)

Topic Maps (18)

Tutorials (6)

Reference (7)

combine

data management

slug: step-combine

Combine Step

Purpose

Integrates multiple datasets into a single unified dataset using either vertical stacking (append) or relational joins. Provides flexible options for merging data from disparate sources.

When to Use

Consolidate data from multiple sources
Integrate data across different time periods
Combine complementary datasets for comprehensive analysis
Enrich base data with additional attributes from lookup tables
Create master datasets from specialized extracts
Implement SQL-like joins in the data pipeline

How It Works

Takes the base dataset and one or more additional datasets
Applies one of two combination methods:
- Append: Stacks datasets vertically (adding rows)
- Join: Merges datasets horizontally (adding columns) based on key relationships
Handles metadata merging for PDV (Physical Data View)
Tracks detailed statistics about the combination process
Returns a unified dataset with consolidated structure

Parameters

Required

datasets (array) - Datasets to combine with the base dataset

Optional

method (string) - Combination method: append (default) or join
join_type (string) - For join method: inner, left, right, or full (default: inner)
join_on (string|array) - Join key(s) or conditions:
- Simple string: "customer_id" (assumes same column name in both datasets)
- Explicit mapping: {"left": "id", "right": "customer_id"}
- Multiple conditions: [{"left": "id", "right": "customer_id"}, {"left": "region", "right": "territory"}]
join_prefix (boolean) - Add dataset name prefix to columns to prevent conflicts (default: true)
keep_duplicates (boolean) - For append method: keep duplicate records (default: true)
strict_mode (boolean) - Throw exceptions on errors (default: false)
preserve_types (boolean) - Maintain data types during combination (default: true)

Input Requirements

Base dataset must be properly structured
Additional datasets must be available and compatible for the chosen combination method
For joins, specified join keys must exist in respective datasets
At least one additional dataset must be specified in datasets parameter

Output

Data

For append: Vertically stacked records from all datasets
For join: Base dataset enriched with columns from other datasets based on join relationships

PDV

Combined metadata for all columns in the result dataset
For joins with join_prefix: true, column names include dataset prefixes
Latest metadata definition wins for overlapping column names

Extras

combineData_method - Method used (append or join)
combineData_applied - Timestamp when operation was performed
records_in - Number of records in the base dataset
records_out - Number of records in the result dataset
datasets_combined - Number of datasets combined

For append operations:

append_stats - Detailed statistics about the append operation:
- base_records - Count of records in base dataset
- datasets_appended - Array of appended datasets with record counts

For join operations:

join_type - Type of join performed
join_stats - Detailed statistics about the join operation:
- base_records - Count of records in base dataset
- joins_performed - Array of join operations with before/after counts

Example Usage

Append Example

combine:
  method: append
  datasets:
    - q1_sales
    - q2_sales
    - q3_sales
    - q4_sales
  keep_duplicates: false

Join Example

combine:
  method: join
  datasets:
    customers: customers_dataset
    orders: orders_dataset
    products: products_lookup
  join_type: left
  join_on:
    - {left: "customer_id", right: "id"}
    - {left: "product_id", right: "sku"}
  join_prefix: true

Example Output

Append Example

Input Datasets

Base Dataset (q1_sales)	quarter	region	product	revenue
Q1	North	Widgets	12500
Q1	South	Gadgets	8750

Additional Dataset (q2_sales)	quarter	region	product	revenue
Q2	North	Widgets	15000
Q2	East	Gadgets	11250

Appended Output

quarter	region	product	revenue
Q1	North	Widgets	12500
Q1	South	Gadgets	8750
Q2	North	Widgets	15000
Q2	East	Gadgets	11250

Join Example

Input Datasets

Base Dataset (orders)	order_id	customer_id	product_id	quantity
1001	C123	P456	2
1002	C789	P345	1

Additional Dataset (customers)	id	name	segment
C123	John Smith	Premium
C789	Sarah Jones	Standard

Joined Output (with `join_prefix: true`)

order_id	customer_id	product_id	quantity	customers_name	customers_segment
1001	C123	P456	2	John Smith	Premium
1002	C789	P345	1	Sarah Jones	Standard

DAZL Documentation | Data Analytics A-to-Z Processing Language

Contents

Quick Index Pages (1)

Steps (34)

Recipes (24)

Topic Maps (18)

Examples (19)

Tutorials (6)

Reference (7)

combine

Combine Step

Purpose

When to Use

How It Works

Parameters

Required

Optional

Input Requirements

Output

Data

PDV

Extras

Example Usage

Append Example

Join Example

Example Output

Append Example

Input Datasets

Appended Output

Join Example

Input Datasets

Joined Output (with `join_prefix: true`)

Related Documentation

DAZL Documentation | Data Analytics A-to-Z Processing Language

Contents

Quick Index Pages (1)

Steps (34)

Recipes (24)

Topic Maps (18)

Examples (19)

Tutorials (6)

Reference (7)

combine

Combine Step

Purpose

When to Use

How It Works

Parameters

Required

Optional

Input Requirements

Output

Data

PDV

Extras

Example Usage

Append Example

Join Example

Example Output

Append Example

Input Datasets

Appended Output

Join Example

Input Datasets

Joined Output (with join_prefix: true)

Related Documentation

Joined Output (with `join_prefix: true`)