DAZL Documentation | Data Analytics A-to-Z Processing Language


Contents

Analysis of numeric variables

exploratory statistics

slug: recipe-exploratory-statistics-analysis-of-numeric-variables

Recipe: Analysis of numeric variables

category: exploratory statistics

Problem

You need to understand the distribution and characteristics of numeric variables:

  • detect outliers or extreme values
  • summarize key statistics (mean, median, standard deviation)
  • identify patterns for further analysis or transformation

Solution

Follow these steps to perform univariate analysis:

  • load the dataset
  • apply univariate analysis to the target numeric fields
  • review summary statistics and visualizations
  • optionally filter or flag outliers for cleaning

Step Sequence

load step -> univariate step -> filter step

Input Datasets

  • transactions_clean — cleaned transactional data
  • Notes: focus on numeric fields like amount, quantity, discount

Output Dataset

  • numeric_summary — table summarizing each numeric variable
  • Notes: includes count, mean, median, min, max, standard deviation, and optionally flagged outliers

Step-By-Step Explanation

Step Purpose Notes
load step Load the dataset Supports local file, database, or API sources
univariate step Compute summary statistics for numeric variables Example: calculate mean, median, SD for amount
filter step Optionally flag or remove outliers Example: filter transactions with amounts > 3 SDs from mean

Variations & Extensions

  • Apply chart step to visualize distributions (histogram, boxplot)
  • Combine with calculate step to create normalized or transformed fields
  • Use compare step to compare numeric distributions across different datasets or periods

Concepts Demonstrated

  • Univariate statistical analysis
  • Outlier detection
  • Data summarization for numeric fields
  • Sequencing statistical steps

Related Recipes

  • Frequency analysis of categorical data
  • Correlation analysis between numeric variables

Notes & Best Practices

  • Always inspect extreme values before filtering or transforming
  • Document transformations applied for reproducibility
  • Consider visualizing distributions for better insight