data management
slug: step-loadInitializes the data pipeline by loading a dataset into memory. This step acts as the entry point for all downstream transformations and analyses in a workflow.
executeStep) resolves the dataset reference and fetches the corresponding data.handle_load) receives the dataset, wraps it into a consistent structure (data, pdv, extras), and returns it to the orchestrator.dataset (object) — Defines the data source to load.
source (string) — Name or identifier of the dataset (e.g., table name or virtual dataset).type (string) — Type of source (e.g., sql, api, array, etc.).output (string) — Name of the dataset alias for downstream reference.handle_load() function.$params['data'] when applicable.$inArray['data'].| Key | Description |
|---|---|
data |
Array of dataset records |
pdv |
Column metadata (if provided) |
extras |
Record count and other optional info |
outputType |
"work" — signals the step produced an in-memory dataset |
steps:
- load:
dataset:
source: freqTest
type: sql
output: dataToAnalyze
- freq:
dataset: dataToAnalyze
columns: [priority, region]
output: Two Column Summary
{
"data": [
{"priority": "High", "region": "East"},
{"priority": "Medium", "region": "West"}
],
"pdv": {},
"extras": {"record_count": 2},
"outputType": "work"
}