data management
slug: step-filterFilters a dataset based on specified conditions, removing rows that don't meet the criteria. Provides detailed tracking of how many records were removed by each filter.
where (array) - Filter conditions to apply. Each condition should be a string expression.strict_mode (boolean) - When true, throws exceptions for invalid filter expressions. Default: falseExpressions follow the pattern: column operator value
= - Equals!= - Not equals< - Less than> - Greater than<= - Less than or equal to>= - Greater than or equal tocontains - String containsin - Value is in list (e.g., status in [active, pending])not in - Value is not in listwaterfall - Detailed statistics tracking the impact of each filter:
initial_count - Number of rows before filteringfilters - Array with details for each filter:expression - The filter condition appliedbefore_count - Row count before this filterafter_count - Row count after this filterremoved_count - Number of rows removed by this filterfinal_count - Number of rows in the final datasetfilter:
where:
- "age > 18"
- "status = active"
- "region in [North, South, East]"
- "name contains Smith"
Filter conditions can also be passed as a direct array:
filter:
- "age > 18"
- "status = active"
- "region in [North, South, East]"
| id | name | age | status | region |
|---|---|---|---|---|
| 1 | John Smith | 35 | active | North |
| 4 | Jane Smith | 42 | active | East |
extras.waterfall = {
"initial_count": 100,
"filters": [
{
"expression": "age > 18",
"before_count": 100,
"after_count": 82,
"removed_count": 18
},
{
"expression": "status = active",
"before_count": 82,
"after_count": 45,
"removed_count": 37
},
{
"expression": "region in [North, South, East]",
"before_count": 45,
"after_count": 38,
"removed_count": 7
},
{
"expression": "name contains Smith",
"before_count": 38,
"after_count": 2,
"removed_count": 36
}
],
"final_count": 2
}