drop
data management
slug: step-drop
Drop Step
Purpose
Removes specified columns from a dataset while preserving all other columns and data. Streamlines datasets by eliminating unnecessary fields.
When to Use
- Remove sensitive or personally identifiable information (PII)
- Eliminate redundant or unused columns
- Clean up intermediate calculation columns
- Improve performance by reducing dataset size
- Remove columns with poor data quality
- Prepare data for export by excluding internal fields
How It Works
- Takes a list of columns to remove
- Filters the dataset to exclude only the specified columns
- Updates the PDV (Physical Data View) metadata to remove dropped column definitions
- Preserves the row count and all other columns
- Tracks metadata about the column removal process
Parameters
Required
columns - Specifies which columns to remove, using either:
- String: Single column name (e.g.,
"internal_id")
- Array: Multiple column names (e.g.,
["internal_id", "created_by", "last_modified"])
Input Requirements
- Any dataset with columns
- If specified columns don't exist, they're simply ignored (no error)
- If no columns are specified, the dataset is returned unchanged
Output
Data
- Same number of rows as input but without the dropped columns
- Original column order is preserved for remaining columns
PDV
- Contains metadata for all columns except the dropped ones
- Original column metadata structure is preserved for remaining columns
Extras
drop_applied - Timestamp when the operation was performed
columns_before - Number of columns before the operation
columns_after - Number of columns after the operation
columns_dropped - List of column names that were dropped
Example Usage
# Remove sensitive personal information
drop:
columns:
- "ssn"
- "credit_card_number"
- "home_address"
# Clean up after calculations
drop:
columns:
- "temp_calculation"
- "interim_value"
- "debug_flag"
Example Output
Input Data
| id |
customer_name |
email |
ssn |
purchase_total |
internal_notes |
created_date |
| 1 |
John Smith |
john@example.com |
123-45-6789 |
125.99 |
VIP customer |
2023-10-15 |
| 2 |
Sarah Jones |
sarah@example.com |
987-65-4321 |
89.50 |
Frequent buyer |
2023-10-16 |
| 3 |
Michael Brown |
mike@example.com |
456-78-9012 |
215.75 |
New customer |
2023-10-17 |
Output After Drop (Using columns: ["ssn", "internal_notes"])
| id |
customer_name |
email |
purchase_total |
created_date |
| 1 |
John Smith |
john@example.com |
125.99 |
2023-10-15 |
| 2 |
Sarah Jones |
sarah@example.com |
89.50 |
2023-10-16 |
| 3 |
Michael Brown |
mike@example.com |
215.75 |
2023-10-17 |
Related Documentation
- [step-keep]] - Retain only specified columns (opposite of drop)