DAZL Documentation | Data Analytics A-to-Z Processing Language


Contents

Standardize customer codes across datasets

data cleanup

slug: recipe-data-cleanup-standardize-customer-codes-across-datasets

Recipe: Standardize customer codes across datasets

category: data cleansing and standardization

Problem

Customer identifiers often differ between tables:

  • inconsistent formatting (e.g., C123 vs c-123)
  • missing prefixes or leading zeros
  • duplicates due to inconsistent naming

Solution

Follow these steps to standardize customer codes:

  • load relevant datasets
  • apply transformation rules to normalize identifiers
  • optionally merge with a reference table to validate codes
  • remove duplicates caused by inconsistent formatting

Step Sequence

load step -> calculate step -> keep step -> drop step

Input Datasets

  • transactions_clean — cleaned transactional data
  • customers_ref — reference table of valid customer codes
  • Notes: customer codes may have inconsistent formatting or missing entries

Output Dataset

  • transactions_standardized — dataset with uniform customer codes
  • Notes: duplicates removed, all codes normalized, ready for analysis or merging

Step-By-Step Explanation

Step Purpose Notes
load step Load datasets Includes transaction and reference tables
calculate step Normalize customer codes Apply rules: uppercase, remove special characters, pad numbers
keep step Retain relevant columns Keep only necessary fields for analysis
drop step Remove redundant or invalid columns Optional: drop original raw code columns

Variations & Extensions

  • Combine with compare step to identify mismatched codes between datasets
  • Apply lag step or calculate step to compute derived customer metrics
  • Include combine step to merge standardized dataset with other data sources

Concepts Demonstrated

  • Data standardization
  • Identifier normalization
  • Duplicate removal
  • Sequencing transformation steps

Related Recipes

  • Remove invalid records and handle missing values
  • Detect and reconcile differences between tables

Notes & Best Practices

  • Document the normalization rules applied
  • Always validate against a reference table to catch anomalies
  • Keep original identifiers in a separate column for traceability

Metadata


title: "Standardize customer codes across datasets"
category: "data cleansing and standardization"
difficulty: "Intermediate"
tags: [data-cleaning, standardization, customer-codes]
inputs: [transactions_clean, customers_ref]
outputs: [transactions_standardized]
steps: [step-load, step-calculate, step-keep, step-drop]
author: "Tom Argiro"
last_updated: "2025-10-25"
doc_type: "recipe"