DAZL Documentation | Data Analytics A-to-Z Processing Language


Contents

load

data management

slug: step-load

Purpose

Initializes the data pipeline by loading a dataset into memory. This step acts as the entry point for all downstream transformations and analyses in a workflow.

When to Use

  • Start a new workflow with a dataset reference (SQL, API, or virtual source)
  • Pass already loaded data into subsequent steps
  • Standardize input structure for the pipeline

How It Works

  1. The orchestrator (executeStep) resolves the dataset reference and fetches the corresponding data.
  2. The step handler (handle_load) receives the dataset, wraps it into a consistent structure (data, pdv, extras), and returns it to the orchestrator.
  3. The resulting structure becomes the working dataset for subsequent steps.

Parameters

Required

  • dataset (object) — Defines the data source to load.

    • source (string) — Name or identifier of the dataset (e.g., table name or virtual dataset).
    • type (string) — Type of source (e.g., sql, api, array, etc.).

Optional

  • output (string) — Name of the dataset alias for downstream reference.

Security Features

  • All external access (SQL, API, etc.) is handled by the orchestrator, not directly within the handle_load() function.
  • The handler itself performs no data fetching or evaluation — it only returns normalized structures.

Input Requirements

  • Input may be empty or contain pre-loaded data.
  • The orchestrator provides $params['data'] when applicable.

Output

Data

  • Returns the dataset provided by the orchestrator in $inArray['data'].

PDV

  • Metadata about the dataset’s columns (if available).

Extras

  • Includes basic diagnostics such as record counts.

Output Structure

Key Description
data Array of dataset records
pdv Column metadata (if provided)
extras Record count and other optional info
outputType "work" — signals the step produced an in-memory dataset

Example Usage

steps:
  - load:
      dataset:
        source: freqTest
        type: sql
      output: dataToAnalyze

  - freq:
      dataset: dataToAnalyze
      columns: [priority, region]
      output: Two Column Summary

Example Output

{
  "data": [
    {"priority": "High", "region": "East"},
    {"priority": "Medium", "region": "West"}
  ],
  "pdv": {},
  "extras": {"record_count": 2},
  "outputType": "work"
}

Related Documentation

  • loadInline step – Load small inline datasets
  • filter step – Filter records after loading
  • calculate step – Add or modify columns after loading
  • sort step - Arrange the dataset
  • keep step - Specify which columns to keep
  • drop step - Specify which columns to remove