Creates lagged (historical) versions of specified columns, enabling time series analysis, trend comparisons, and change detection. Supports both row-based and date-based lagging methods.
When to Use
Time series analysis across multiple periods
Calculate period-over-period changes
Detect trends and patterns over time
Compare current values to historical values
Create moving averages or rolling calculations
Analyze sequential events or transactions
Build predictive models using historical features
How It Works
Takes one or more columns and creates lagged versions based on specified periods
Supports two lagging methods:
Row-based: Looks back a specific number of rows in the dataset
Date-based: Looks back based on date/time values (days, weeks, months)
Optionally groups data to create lags within specific segments
Creates new columns with naming pattern {original_column}_lag{period}
Handles missing lag values with configurable fill behavior
Parameters
Required
columns (string|array) - Column(s) to create lagged versions for
Optional
periods (int|array) - Number of periods to lag back (default: [1])
groupBy (string) - Column to group by before calculating lags (default: null)
fillValue (mixed) - Value to use when lag value is unavailable (default: null)
dateColumn (string) - Date column for date-based lagging (default: null)
dateUnit (string) - Unit for date-based lag: days, weeks, or months (default: days)
Input Requirements
Dataset must be sorted appropriately for row-based lagging
For date-based lagging, the specified date column must contain valid dates
For grouped lagging, the group column must exist
Output
Data
Original dataset with additional lag columns added
New column names follow the pattern: {original_column}_lag{period}
PDV
Updated with metadata for all new lag columns
Labels for lag columns include the lag period/unit information