DAZL Documentation | Data Analytics A-to-Z Processing Language

Anomoly Detection

business analytics

slug: topic-map-business-analytics-anomoly-detection

Vocabulary:

anomaly: Observation that deviates significantly from expected pattern
outlier: Data point that lies outside normal range
z_score: Number of standard deviations from the mean
threshold: Boundary beyond which observations are flagged as anomalous
expected_value: Predicted or typical value for a segment
deviation: Difference between observed and expected
control_limits: Upper and lower bounds for normal variation (typically ±3σ)
false_positive: Flagging normal variation as anomalous
false_negative: Missing true anomalies
statistical_process_control: Monitoring data over time for out-of-control signals
interquartile_range: IQR = Q3 - Q1, used for outlier detection
mad: Median Absolute Deviation - robust measure of variability
isolation_score: Machine learning measure of how "different" an observation is

Concepts:

expected_vs_observed: Comparing what happened to what should have happened
contextual_anomaly: Normal in isolation but anomalous in context (e.g., high sales on Black Friday is normal)
collective_anomaly: Set of points that together are anomalous
point_anomaly: Single data point that is anomalous
level_specific_detection: What's anomalous at level 2 might be normal at level 0
temporal_anomaly: Deviation from historical patterns
cross_sectional_anomaly: Deviation from peer segments
multivariate_anomaly: Unusual combination of values across multiple measures

Concepts_advanced:

adaptive_thresholds: Boundaries that adjust based on recent patterns
seasonal_adjustment: Accounting for expected seasonal patterns before detecting anomalies
robust_estimation: Using median/MAD instead of mean/SD to avoid outlier contamination
ensemble_detection: Combining multiple anomaly detection methods
confidence_intervals: Probabilistic bounds rather than fixed thresholds

Procedures:

calculate_expected_value: Use historical average, level parent, or predictive model
calculate_standard_deviation: Measure of normal variation
calculate_z_score: (observed - expected) / std_dev
flag_by_threshold: Mark observations where |z_score| > threshold (typically 2 or 3)
calculate_iqr: Q3 - Q1 from distribution
flag_by_iqr: Mark observations outside [Q1 - 1.5×IQR, Q3 + 1.5×IQR]
calculate_mad: median(|x - median(x)|)
flag_by_mad: Mark observations where |x - median| > threshold × MAD
rank_by_severity: Order anomalies by degree of deviation
classify_anomaly_type: High vs low, point vs collective, temporal vs cross-sectional

Procedures_detailed:

Topics:

Categories:

Themes:

Trends:

ml_based_anomaly_detection: Using isolation forests, autoencoders for detection
real_time_streaming_detection: Identifying anomalies as data arrives
contextual_learning: Systems that learn what's normal for each segment
automated_root_cause: AI suggesting why anomaly occurred
federated_anomaly_detection: Detecting anomalies across distributed data sources

Use_cases:

retail: "Store #47's sales 3.2σ below expected - investigate operational issue"
saas: "Enterprise customer's API usage spiked 10x - potential integration problem or abuse"
manufacturing: "Production line 3 defect rate jumped to 8% (control limit: 3%) - stop and inspect"
finance: "Transaction pattern for account XYZ anomalous - flag for fraud review"
healthcare: "Patient readmission rate 2.5σ above expected - quality of care concern"
marketing: "Campaign CTR 5x higher than baseline - investigate creative for scalability"
logistics: "Delivery times to region Z spiked - check for weather/route issues"
ecommerce: "Product return rate jumped from 2% to 12% - possible defective batch"