Remove invalid records and handle missing values
data cleanup
slug: recipe-data-cleanup-remove-invalid-records-and-handle-missing-values
Recipe: Remove invalid records and handle missing values
category: data cleansing and standardization
Problem
Datasets often contain:
- missing values in critical fields
- duplicate or inconsistent records
- incorrect or out-of-range values that can distort analysis
Solution
Follow these steps to clean the dataset:
- load the dataset
- filter out invalid or incomplete rows
- handle missing values (e.g., impute, default, or remove)
- optionally standardize or normalize key fields
Step Sequence
load step -> filter step -> calculate step
Input Datasets
transactions_raw — raw transactional data
- Notes: may contain null values, duplicates, or inconsistent identifiers
Output Dataset
transactions_clean — cleaned dataset ready for analysis
- Notes: invalid rows removed, missing values handled, identifiers standardized
Step-By-Step Explanation
| Step |
Purpose |
Notes |
| load step |
Load raw dataset |
Supports local file, database, or API sources |
| filter step |
Remove invalid or incomplete rows |
Example: filter out missing amount or invalid customer_id |
| calculate step |
Handle missing values and standardize fields |
Example: fill missing amounts with 0 or median; normalize customer_id |
Variations & Extensions
- Use keep step or drop step to retain only relevant columns
- Apply transpose step or lengthen step if reshaping is required after cleaning
- Include combine step to merge with reference tables during the cleaning process
Concepts Demonstrated
- Data cleansing
- Missing value handling
- Standardization of key fields
- Sequencing multiple cleaning steps
Related Recipes
- Standardize customer codes across datasets
- Detect and reconcile differences between tables
Notes & Best Practices
- Always back up raw datasets before cleaning
- Document assumptions for missing value handling
- Validate results to ensure no critical data was removed
Metadata
title: "Remove invalid records and handle missing values"
category: "data cleansing and standardization"
difficulty: "Beginner"
tags: [data-cleaning, missing-values, preprocessing]
inputs: [transactions_raw]
outputs: [transactions_clean]
steps: [step-load, step-filter, step-calculate]
author: "Tom Argiro"
last_updated: "2025-10-25"
doc_type: "recipe"