DAZL Documentation | Data Analytics A-to-Z Processing Language


Contents

drop

data management

slug: step-drop

Drop Step

Purpose

Removes specified columns from a dataset while preserving all other columns and data. Streamlines datasets by eliminating unnecessary fields.

When to Use

  • Remove sensitive or personally identifiable information (PII)
  • Eliminate redundant or unused columns
  • Clean up intermediate calculation columns
  • Improve performance by reducing dataset size
  • Remove columns with poor data quality
  • Prepare data for export by excluding internal fields

How It Works

  1. Takes a list of columns to remove
  2. Filters the dataset to exclude only the specified columns
  3. Updates the PDV (Physical Data View) metadata to remove dropped column definitions
  4. Preserves the row count and all other columns
  5. Tracks metadata about the column removal process

Parameters

Required

  • columns - Specifies which columns to remove, using either:
    • String: Single column name (e.g., "internal_id")
    • Array: Multiple column names (e.g., ["internal_id", "created_by", "last_modified"])

Input Requirements

  • Any dataset with columns
  • If specified columns don't exist, they're simply ignored (no error)
  • If no columns are specified, the dataset is returned unchanged

Output

Data

  • Same number of rows as input but without the dropped columns
  • Original column order is preserved for remaining columns

PDV

  • Contains metadata for all columns except the dropped ones
  • Original column metadata structure is preserved for remaining columns

Extras

  • drop_applied - Timestamp when the operation was performed
  • columns_before - Number of columns before the operation
  • columns_after - Number of columns after the operation
  • columns_dropped - List of column names that were dropped

Example Usage

# Remove sensitive personal information
drop:
  columns:
    - "ssn"
    - "credit_card_number"
    - "home_address"

# Clean up after calculations
drop:
  columns:
    - "temp_calculation"
    - "interim_value"
    - "debug_flag"

Example Output

Input Data

id customer_name email ssn purchase_total internal_notes created_date
1 John Smith john@example.com 123-45-6789 125.99 VIP customer 2023-10-15
2 Sarah Jones sarah@example.com 987-65-4321 89.50 Frequent buyer 2023-10-16
3 Michael Brown mike@example.com 456-78-9012 215.75 New customer 2023-10-17

Output After Drop (Using columns: ["ssn", "internal_notes"])

id customer_name email purchase_total created_date
1 John Smith john@example.com 125.99 2023-10-15
2 Sarah Jones sarah@example.com 89.50 2023-10-16
3 Michael Brown mike@example.com 215.75 2023-10-17

Related Documentation

  • [step-keep]] - Retain only specified columns (opposite of drop)