DAZL Documentation | Data Analytics A-to-Z Processing Language


Contents

Crosstab analysis of two categorical variables

exploratory statistics

slug: recipe-exploratory-statistics-crosstab-analysis-of-two-categorical-variables

@@## Recipe: Crosstab analysis of two categorical variables category: exploratory statistics

Problem

You need to examine relationships between two categorical variables:

  • understand joint distributions (e.g., product category vs. customer segment)
  • detect patterns or dependencies
  • identify unexpected combinations or data quality issues

Solution

Follow these steps to perform a two-way frequency analysis:

  • load the dataset
  • select two categorical variables
  • compute a cross-tabulation of counts or percentages
  • optionally visualize as a heatmap or table

Step Sequence

load step -> [step-xfreq] -> filter step -> chart step

Input Datasets

  • transactions_clean — cleaned transactional data
  • Notes: select categorical fields such as product_category and customer_segment

Output Dataset

  • crosstab_table — table showing counts or percentages for each combination of the two variables
  • Notes: can be used to detect trends, dependencies, or anomalies

Step-By-Step Explanation

Step Purpose Notes
load step Load dataset Supports local file, database, or API sources
[step-xfreq] Compute cross-tabulation Example: count transactions per product_category × customer_segment
filter step Optionally exclude rare combinations Example: combinations with <5 occurrences
chart step Visualize cross-tabulation Optional heatmap, stacked bar chart, or table

Variations & Extensions

  • Apply calculate step to compute percentages or normalized values
  • Combine with compare step to compare crosstabs across datasets or periods
  • Include dashboard step to create interactive views of joint distributions

Concepts Demonstrated

  • Crosstab / two-way frequency analysis
  • Identifying dependencies between categorical variables
  • Highlighting rare or unexpected combinations
  • Sequencing statistics and visualization steps

Related Recipes

  • Frequency analysis of categorical data
  • Univariate analysis of numeric variables

Notes & Best Practices

  • Inspect low-frequency combinations for potential data issues
  • Use percentages to standardize across different dataset sizes
  • Document the variables used and any filtering applied

Metadata


title: "Crosstab analysis of two categorical variables"
category: "exploratory statistics"
difficulty: "Intermediate"
tags: [crosstab, xfreq, categorical, EDA]
inputs: [transactions_clean]
outputs: [crosstab_table]
steps: [step-load, step-xfreq, step-filter, step-chart]
author: "Tom Argiro"
last_updated: "2025-10-25"
doc_type: "recipe"