Frequency analysis of categorical data
exploratory statistics
slug: recipe-exploratory-statistics-frequency-analysis-of-categorical-data
Recipe: Frequency analysis of categorical data
category: exploratory statistics
Problem
You need to understand the distribution of categorical variables:
- count how often each category occurs
- detect imbalances or unexpected values
- identify data quality issues such as typos or rare categories
Solution
Follow these steps to generate frequency tables:
- load the dataset
- apply frequency analysis to the desired categorical columns
- review counts and proportions for each category
- optionally filter or flag rare or invalid categories
Step Sequence
load step -> freq step -> filter step
Input Datasets
transactions_clean — cleaned transactional data
- Notes: focus on categorical fields such as
product_category or customer_segment
Output Dataset
category_frequency — table showing counts and percentages of each category
- Notes: can be used to inform filtering, aggregation, or reporting
Step-By-Step Explanation
| Step |
Purpose |
Notes |
| load step |
Load the dataset |
Supports local file, database, or API sources |
| freq step |
Compute frequency counts for categorical variables |
Example: count product_category occurrences |
| filter step |
Optionally remove rare or invalid categories |
Example: exclude categories with <5 occurrences |
Variations & Extensions
- Apply freq step to multiple categorical fields in one dataset
- Combine with chart step to visualize distributions
- Use compare step to compare frequencies across different datasets or time periods
Concepts Demonstrated
- Frequency analysis
- Identifying data imbalances
- Detecting data quality issues
- Sequencing statistics steps
Related Recipes
- Univariate analysis of numeric variables
- Crosstab analysis of two categorical fields
Notes & Best Practices
- Always inspect rare categories to confirm they are not errors
- Consider grouping small categories into “Other” for reporting
- Document the variables analyzed and any filtering applied
Metadata
title: "Frequency analysis of categorical data"
category: "exploratory statistics"
difficulty: "Beginner"
tags: [frequency, categorical, EDA]
inputs: [transactions_clean]
outputs: [category_frequency]
steps: [step-load, step-freq, step-filter]
author: "Tom Argiro"
last_updated: "2025-10-25"
doc_type: "recipe"