Crosstab analysis of two categorical variables
exploratory statistics
slug: recipe-exploratory-statistics-crosstab-analysis-of-two-categorical-variables
@@## Recipe: Crosstab analysis of two categorical variables
category: exploratory statistics
Problem
You need to examine relationships between two categorical variables:
- understand joint distributions (e.g., product category vs. customer segment)
- detect patterns or dependencies
- identify unexpected combinations or data quality issues
Solution
Follow these steps to perform a two-way frequency analysis:
- load the dataset
- select two categorical variables
- compute a cross-tabulation of counts or percentages
- optionally visualize as a heatmap or table
Step Sequence
load step -> [step-xfreq] -> filter step -> chart step
Input Datasets
transactions_clean — cleaned transactional data
- Notes: select categorical fields such as
product_category and customer_segment
Output Dataset
crosstab_table — table showing counts or percentages for each combination of the two variables
- Notes: can be used to detect trends, dependencies, or anomalies
Step-By-Step Explanation
| Step |
Purpose |
Notes |
| load step |
Load dataset |
Supports local file, database, or API sources |
| [step-xfreq] |
Compute cross-tabulation |
Example: count transactions per product_category × customer_segment |
| filter step |
Optionally exclude rare combinations |
Example: combinations with <5 occurrences |
| chart step |
Visualize cross-tabulation |
Optional heatmap, stacked bar chart, or table |
Variations & Extensions
- Apply calculate step to compute percentages or normalized values
- Combine with compare step to compare crosstabs across datasets or periods
- Include dashboard step to create interactive views of joint distributions
Concepts Demonstrated
- Crosstab / two-way frequency analysis
- Identifying dependencies between categorical variables
- Highlighting rare or unexpected combinations
- Sequencing statistics and visualization steps
Related Recipes
- Frequency analysis of categorical data
- Univariate analysis of numeric variables
Notes & Best Practices
- Inspect low-frequency combinations for potential data issues
- Use percentages to standardize across different dataset sizes
- Document the variables used and any filtering applied
Metadata
title: "Crosstab analysis of two categorical variables"
category: "exploratory statistics"
difficulty: "Intermediate"
tags: [crosstab, xfreq, categorical, EDA]
inputs: [transactions_clean]
outputs: [crosstab_table]
steps: [step-load, step-xfreq, step-filter, step-chart]
author: "Tom Argiro"
last_updated: "2025-10-25"
doc_type: "recipe"