DAZL Documentation | Data Analytics A-to-Z Processing Language


Contents

load and clean raw transaction data

data prep

slug: recipe-data-prep-load-and-clean-raw-transaction-data

Recipe: Load and clean raw transaction data

category: data loading and preparation

Problem

Raw transaction data often contains:

  • missing values in critical fields like amount or customer_id
  • duplicate records
  • inconsistent formatting of key identifiers

Solution

Follow these steps to clean up the data:

  • load the dataset
  • remove invalid or incomplete rows
  • standardize key fields
  • calculate derived metrics as needed

Step Sequence

load step -> filter step -> calculate step

Input Datasets

  • transactions_raw — raw transactional data with fields like transaction_id, customer_id, product_id, date, amount
  • Notes: may contain missing values, duplicates, or inconsistent identifiers

Output Dataset

  • transactions_clean — cleaned dataset ready for analysis
  • Notes: all invalid rows removed, missing values addressed, key fields standardized

Step-By-Step Explanation

Step Purpose Notes
load step Load raw transaction dataset Supports local file, database, or API sources
filter step Remove invalid or incomplete rows Filter on missing amount or invalid customer_id
calculate step Standardize identifiers and compute derived fields Example: normalize customer_id, compute total_amount per transaction

Variations & Extensions

  • Add transpose step if dataset needs to pivot for analysis
  • Include combine step to merge with customer reference table before cleaning

Concepts Demonstrated

  • Data cleansing
  • Derived fields
  • Sequencing

Related Recipes

  • Merge multiple input tables for unified dataset
  • Reshape wide datasets into long format

Notes & Best Practices

  • Always validate the dataset after filtering to ensure no important records were removed
  • Document assumptions for handling missing values
  • Keep a copy of the raw dataset for audit purposes

Metadata


title: "Load and clean raw transaction data"
category: "data loading and preparation"
difficulty: "Beginner"
tags: [data-cleaning, preprocessing, transactions]
inputs: [transactions_raw]
outputs: [transactions_clean]
steps: [step-load, step-filter, step-calculate]
author: "Tom Argiro"
last_updated: "2025-10-25"
doc_type: "recipe"