DAZL Documentation | Data Analytics A-to-Z Processing Language


Contents

Remove invalid records and handle missing values

data cleanup

slug: recipe-data-cleanup-remove-invalid-records-and-handle-missing-values

Recipe: Remove invalid records and handle missing values

category: data cleansing and standardization

Problem

Datasets often contain:

  • missing values in critical fields
  • duplicate or inconsistent records
  • incorrect or out-of-range values that can distort analysis

Solution

Follow these steps to clean the dataset:

  • load the dataset
  • filter out invalid or incomplete rows
  • handle missing values (e.g., impute, default, or remove)
  • optionally standardize or normalize key fields

Step Sequence

load step -> filter step -> calculate step

Input Datasets

  • transactions_raw — raw transactional data
  • Notes: may contain null values, duplicates, or inconsistent identifiers

Output Dataset

  • transactions_clean — cleaned dataset ready for analysis
  • Notes: invalid rows removed, missing values handled, identifiers standardized

Step-By-Step Explanation

Step Purpose Notes
load step Load raw dataset Supports local file, database, or API sources
filter step Remove invalid or incomplete rows Example: filter out missing amount or invalid customer_id
calculate step Handle missing values and standardize fields Example: fill missing amounts with 0 or median; normalize customer_id

Variations & Extensions

  • Use keep step or drop step to retain only relevant columns
  • Apply transpose step or lengthen step if reshaping is required after cleaning
  • Include combine step to merge with reference tables during the cleaning process

Concepts Demonstrated

  • Data cleansing
  • Missing value handling
  • Standardization of key fields
  • Sequencing multiple cleaning steps

Related Recipes

  • Standardize customer codes across datasets
  • Detect and reconcile differences between tables

Notes & Best Practices

  • Always back up raw datasets before cleaning
  • Document assumptions for missing value handling
  • Validate results to ensure no critical data was removed

Metadata


title: "Remove invalid records and handle missing values"
category: "data cleansing and standardization"
difficulty: "Beginner"
tags: [data-cleaning, missing-values, preprocessing]
inputs: [transactions_raw]
outputs: [transactions_clean]
steps: [step-load, step-filter, step-calculate]
author: "Tom Argiro"
last_updated: "2025-10-25"
doc_type: "recipe"