Integrates multiple datasets into a single unified dataset using either vertical stacking (append) or relational joins. Provides flexible options for merging data from disparate sources.
When to Use
Consolidate data from multiple sources
Integrate data across different time periods
Combine complementary datasets for comprehensive analysis
Enrich base data with additional attributes from lookup tables
Create master datasets from specialized extracts
Implement SQL-like joins in the data pipeline
How It Works
Takes the base dataset and one or more additional datasets
Applies one of two combination methods:
Append: Stacks datasets vertically (adding rows)
Join: Merges datasets horizontally (adding columns) based on key relationships
Handles metadata merging for PDV (Physical Data View)
Tracks detailed statistics about the combination process
Returns a unified dataset with consolidated structure
Parameters
Required
datasets (array) - Datasets to combine with the base dataset
Optional
method (string) - Combination method: append (default) or join
join_type (string) - For join method: inner, left, right, or full (default: inner)
join_on (string|array) - Join key(s) or conditions:
Simple string: "customer_id" (assumes same column name in both datasets)