DAZL Documentation | Data Analytics A-to-Z Processing Language


Contents

lag

data management

slug: step-lag

Lag Step

Purpose

Creates lagged (historical) versions of specified columns, enabling time series analysis, trend comparisons, and change detection. Supports both row-based and date-based lagging methods.

When to Use

  • Time series analysis across multiple periods
  • Calculate period-over-period changes
  • Detect trends and patterns over time
  • Compare current values to historical values
  • Create moving averages or rolling calculations
  • Analyze sequential events or transactions
  • Build predictive models using historical features

How It Works

  1. Takes one or more columns and creates lagged versions based on specified periods
  2. Supports two lagging methods:
    • Row-based: Looks back a specific number of rows in the dataset
    • Date-based: Looks back based on date/time values (days, weeks, months)
  3. Optionally groups data to create lags within specific segments
  4. Creates new columns with naming pattern {original_column}_lag{period}
  5. Handles missing lag values with configurable fill behavior

Parameters

Required

  • columns (string|array) - Column(s) to create lagged versions for

Optional

  • periods (int|array) - Number of periods to lag back (default: [1])
  • groupBy (string) - Column to group by before calculating lags (default: null)
  • fillValue (mixed) - Value to use when lag value is unavailable (default: null)
  • dateColumn (string) - Date column for date-based lagging (default: null)
  • dateUnit (string) - Unit for date-based lag: days, weeks, or months (default: days)

Input Requirements

  • Dataset must be sorted appropriately for row-based lagging
  • For date-based lagging, the specified date column must contain valid dates
  • For grouped lagging, the group column must exist

Output

Data

  • Original dataset with additional lag columns added
  • New column names follow the pattern: {original_column}_lag{period}

PDV

  • Updated with metadata for all new lag columns
  • Labels for lag columns include the lag period/unit information

Extras

  • lag_applied - Timestamp when lag was performed
  • lag_type - Method used (row-based or date-based)
  • For date-based lags:
    • lag_date_column - Date column used
    • lag_date_unit - Date unit used

Example Usage

Row-based Lag Example

lag:
  columns:
    - revenue
    - units
  periods: [1, 2, 3]
  fillValue: 0

Date-based Lag Example

lag:
  columns: sales_amount
  periods: [7, 14, 30]
  dateColumn: transaction_date
  dateUnit: days
  groupBy: store_id
  fillValue: 0

Example Output

Row-based Lag

Input Data

date store_id sales_amount
2023-10-01 101 1200
2023-10-02 101 1350
2023-10-03 101 980
2023-10-04 101 1440

Output (with columns: sales_amount, periods: [1, 2])

date store_id sales_amount sales_amount_lag1 sales_amount_lag2
2023-10-01 101 1200 null null
2023-10-02 101 1350 1200 null
2023-10-03 101 980 1350 1200
2023-10-04 101 1440 980 1350

Date-based Lag with Groups

Input Data

transaction_date store_id sales_amount
2023-10-01 101 1200
2023-10-08 101 1350
2023-10-15 101 980
2023-10-01 102 950
2023-10-08 102 1050
2023-10-15 102 1100

Output (with columns: sales_amount, periods: [7], dateColumn: transaction_date, dateUnit: days, groupBy: store_id)

transaction_date store_id sales_amount sales_amount_lag7
2023-10-01 101 1200 null
2023-10-08 101 1350 1200
2023-10-15 101 980 1350
2023-10-01 102 950 null
2023-10-08 102 1050 950
2023-10-15 102 1100 1050

Related Documentation

  • calculate-step - Use lag columns in calculations
  • diff-step - Calculate differences between current and lag values
  • growth-step - Calculate growth rates using lag values
  • moving-average-step - Create smoothed values using lag values
  • time-window-step - Create time-based aggregations