Anomoly Detection
business analytics
slug: topic-map-business-analytics-anomoly-detection
Vocabulary:
- anomaly: Observation that deviates significantly from expected pattern
- outlier: Data point that lies outside normal range
- z_score: Number of standard deviations from the mean
- threshold: Boundary beyond which observations are flagged as anomalous
- expected_value: Predicted or typical value for a segment
- deviation: Difference between observed and expected
- control_limits: Upper and lower bounds for normal variation (typically ±3σ)
- false_positive: Flagging normal variation as anomalous
- false_negative: Missing true anomalies
- statistical_process_control: Monitoring data over time for out-of-control signals
- interquartile_range: IQR = Q3 - Q1, used for outlier detection
- mad: Median Absolute Deviation - robust measure of variability
- isolation_score: Machine learning measure of how "different" an observation is
Concepts:
- expected_vs_observed: Comparing what happened to what should have happened
- contextual_anomaly: Normal in isolation but anomalous in context (e.g., high sales on Black Friday is normal)
- collective_anomaly: Set of points that together are anomalous
- point_anomaly: Single data point that is anomalous
- level_specific_detection: What's anomalous at level 2 might be normal at level 0
- temporal_anomaly: Deviation from historical patterns
- cross_sectional_anomaly: Deviation from peer segments
- multivariate_anomaly: Unusual combination of values across multiple measures
Concepts_advanced:
- adaptive_thresholds: Boundaries that adjust based on recent patterns
- seasonal_adjustment: Accounting for expected seasonal patterns before detecting anomalies
- robust_estimation: Using median/MAD instead of mean/SD to avoid outlier contamination
- ensemble_detection: Combining multiple anomaly detection methods
- confidence_intervals: Probabilistic bounds rather than fixed thresholds
Procedures:
- calculate_expected_value: Use historical average, level parent, or predictive model
- calculate_standard_deviation: Measure of normal variation
- calculate_z_score: (observed - expected) / std_dev
- flag_by_threshold: Mark observations where |z_score| > threshold (typically 2 or 3)
- calculate_iqr: Q3 - Q1 from distribution
- flag_by_iqr: Mark observations outside [Q1 - 1.5×IQR, Q3 + 1.5×IQR]
- calculate_mad: median(|x - median(x)|)
- flag_by_mad: Mark observations where |x - median| > threshold × MAD
- rank_by_severity: Order anomalies by degree of deviation
- classify_anomaly_type: High vs low, point vs collective, temporal vs cross-sectional
Procedures_detailed:
- expected_from_parent: Use level-1 value as expected for level-2 segments
- expected_from_history: Use average of past N periods
- expected_from_regression: Predict based on relationship with other variables
- calculate_control_limits: μ ± 3σ for upper and lower control limits
- calculate_confidence_interval: μ ± t_critical × (σ/√n)
- residual_analysis: observed - fitted, then analyze residuals for patterns
Topics:
- fraud_detection
- quality_control_monitoring
- sudden_change_detection
- performance_degradation_alerts
- unusual_pattern_identification
- data_quality_validation
- early_warning_systems
- exception_reporting
- root_cause_investigation
- predictive_maintenance
Categories:
- statistical_monitoring
- deviation_analysis
- exception_detection
- pattern_recognition
- quality_assurance
Themes:
- signal_vs_noise: Distinguishing meaningful deviations from random variation
- early_detection: Catching problems before they escalate
- actionable_alerts: Focusing attention on what truly matters
- continuous_monitoring: Ongoing surveillance for changes
Trends:
- ml_based_anomaly_detection: Using isolation forests, autoencoders for detection
- real_time_streaming_detection: Identifying anomalies as data arrives
- contextual_learning: Systems that learn what's normal for each segment
- automated_root_cause: AI suggesting why anomaly occurred
- federated_anomaly_detection: Detecting anomalies across distributed data sources
Use_cases:
- retail: "Store #47's sales 3.2σ below expected - investigate operational issue"
- saas: "Enterprise customer's API usage spiked 10x - potential integration problem or abuse"
- manufacturing: "Production line 3 defect rate jumped to 8% (control limit: 3%) - stop and inspect"
- finance: "Transaction pattern for account XYZ anomalous - flag for fraud review"
- healthcare: "Patient readmission rate 2.5σ above expected - quality of care concern"
- marketing: "Campaign CTR 5x higher than baseline - investigate creative for scalability"
- logistics: "Delivery times to region Z spiked - check for weather/route issues"
- ecommerce: "Product return rate jumped from 2% to 12% - possible defective batch"