DAZL Documentation | Data Analytics A-to-Z Processing Language


Contents

Frequency analysis of categorical data

exploratory statistics

slug: recipe-exploratory-statistics-frequency-analysis-of-categorical-data

Recipe: Frequency analysis of categorical data

category: exploratory statistics

Problem

You need to understand the distribution of categorical variables:

  • count how often each category occurs
  • detect imbalances or unexpected values
  • identify data quality issues such as typos or rare categories

Solution

Follow these steps to generate frequency tables:

  • load the dataset
  • apply frequency analysis to the desired categorical columns
  • review counts and proportions for each category
  • optionally filter or flag rare or invalid categories

Step Sequence

load step -> freq step -> filter step

Input Datasets

  • transactions_clean — cleaned transactional data
  • Notes: focus on categorical fields such as product_category or customer_segment

Output Dataset

  • category_frequency — table showing counts and percentages of each category
  • Notes: can be used to inform filtering, aggregation, or reporting

Step-By-Step Explanation

Step Purpose Notes
load step Load the dataset Supports local file, database, or API sources
freq step Compute frequency counts for categorical variables Example: count product_category occurrences
filter step Optionally remove rare or invalid categories Example: exclude categories with <5 occurrences

Variations & Extensions

  • Apply freq step to multiple categorical fields in one dataset
  • Combine with chart step to visualize distributions
  • Use compare step to compare frequencies across different datasets or time periods

Concepts Demonstrated

  • Frequency analysis
  • Identifying data imbalances
  • Detecting data quality issues
  • Sequencing statistics steps

Related Recipes

  • Univariate analysis of numeric variables
  • Crosstab analysis of two categorical fields

Notes & Best Practices

  • Always inspect rare categories to confirm they are not errors
  • Consider grouping small categories into “Other” for reporting
  • Document the variables analyzed and any filtering applied

Metadata


title: "Frequency analysis of categorical data"
category: "exploratory statistics"
difficulty: "Beginner"
tags: [frequency, categorical, EDA]
inputs: [transactions_clean]
outputs: [category_frequency]
steps: [step-load, step-freq, step-filter]
author: "Tom Argiro"
last_updated: "2025-10-25"
doc_type: "recipe"