statistical primitive
slug: step-univariateCalculates univariate statistics for numeric columns in a dataset. This step provides descriptive statistics such as mean, standard deviation, minimum, maximum, skewness, and kurtosis, helping analysts understand the distribution and variability of individual numeric fields.
data, pdv, and extras.Iterates through each column:
univariate() function is applied to calculate statistics: count, sum, mean, min, max, standard deviation, skewness, and kurtosis.columns (array) — List of column names to analyze.
["age", "income", "spend"]data) must be an array of associative arrays (rows).Array of univariate statistics for each numeric column. Each element includes:
column — Column namen — Number of non-missing observationssum — Sum of valuesmean — Average valuemin — Minimum valuemax — Maximum valuestddev — Standard deviationskew — Skewnesskurt — Kurtosis| Key | Description |
|---|---|
data |
Array of univariate statistics per numeric column |
pdv |
Metadata about dataset columns (passed through) |
extras |
Additional runtime information (passed through) |
outputType |
"array" — Indicates structured array output |
steps:
- loadInline:
data:
- {age: 22, income: 38000, spend: 800}
- {age: 25, income: 45000, spend: 1200}
- {age: 29, income: 56000, spend: 1800}
output: sampleData
- univariate:
dataset: sampleData
columns: [age, income, spend]
output: summaryStats
[
{
"column": "age",
"n": 3,
"sum": 76,
"mean": 25.33,
"min": 22,
"max": 29,
"stddev": 3.51,
"skew": 0,
"kurt": -1.5
},
{
"column": "income",
"n": 3,
"sum": 139000,
"mean": 46333.33,
"min": 38000,
"max": 56000,
"stddev": 9070.2,
"skew": 0.5,
"kurt": -1.2
},
{
"column": "spend",
"n": 3,
"sum": 3800,
"mean": 1266.67,
"min": 800,
"max": 1800,
"stddev": 502.66,
"skew": 0.4,
"kurt": -1.3
}
]