DAZL Documentation | Data Analytics A-to-Z Processing Language


Contents

Anomoly Detection

business analytics

slug: topic-map-business-analytics-anomoly-detection

Vocabulary:

  • anomaly: Observation that deviates significantly from expected pattern
  • outlier: Data point that lies outside normal range
  • z_score: Number of standard deviations from the mean
  • threshold: Boundary beyond which observations are flagged as anomalous
  • expected_value: Predicted or typical value for a segment
  • deviation: Difference between observed and expected
  • control_limits: Upper and lower bounds for normal variation (typically ±3σ)
  • false_positive: Flagging normal variation as anomalous
  • false_negative: Missing true anomalies
  • statistical_process_control: Monitoring data over time for out-of-control signals
  • interquartile_range: IQR = Q3 - Q1, used for outlier detection
  • mad: Median Absolute Deviation - robust measure of variability
  • isolation_score: Machine learning measure of how "different" an observation is

Concepts:

  • expected_vs_observed: Comparing what happened to what should have happened
  • contextual_anomaly: Normal in isolation but anomalous in context (e.g., high sales on Black Friday is normal)
  • collective_anomaly: Set of points that together are anomalous
  • point_anomaly: Single data point that is anomalous
  • level_specific_detection: What's anomalous at level 2 might be normal at level 0
  • temporal_anomaly: Deviation from historical patterns
  • cross_sectional_anomaly: Deviation from peer segments
  • multivariate_anomaly: Unusual combination of values across multiple measures

Concepts_advanced:

  • adaptive_thresholds: Boundaries that adjust based on recent patterns
  • seasonal_adjustment: Accounting for expected seasonal patterns before detecting anomalies
  • robust_estimation: Using median/MAD instead of mean/SD to avoid outlier contamination
  • ensemble_detection: Combining multiple anomaly detection methods
  • confidence_intervals: Probabilistic bounds rather than fixed thresholds

Procedures:

  • calculate_expected_value: Use historical average, level parent, or predictive model
  • calculate_standard_deviation: Measure of normal variation
  • calculate_z_score: (observed - expected) / std_dev
  • flag_by_threshold: Mark observations where |z_score| > threshold (typically 2 or 3)
  • calculate_iqr: Q3 - Q1 from distribution
  • flag_by_iqr: Mark observations outside [Q1 - 1.5×IQR, Q3 + 1.5×IQR]
  • calculate_mad: median(|x - median(x)|)
  • flag_by_mad: Mark observations where |x - median| > threshold × MAD
  • rank_by_severity: Order anomalies by degree of deviation
  • classify_anomaly_type: High vs low, point vs collective, temporal vs cross-sectional

Procedures_detailed:

  • expected_from_parent: Use level-1 value as expected for level-2 segments
  • expected_from_history: Use average of past N periods
  • expected_from_regression: Predict based on relationship with other variables
  • calculate_control_limits: μ ± 3σ for upper and lower control limits
  • calculate_confidence_interval: μ ± t_critical × (σ/√n)
  • residual_analysis: observed - fitted, then analyze residuals for patterns

Topics:

  • fraud_detection
  • quality_control_monitoring
  • sudden_change_detection
  • performance_degradation_alerts
  • unusual_pattern_identification
  • data_quality_validation
  • early_warning_systems
  • exception_reporting
  • root_cause_investigation
  • predictive_maintenance

Categories:

  • statistical_monitoring
  • deviation_analysis
  • exception_detection
  • pattern_recognition
  • quality_assurance

Themes:

  • signal_vs_noise: Distinguishing meaningful deviations from random variation
  • early_detection: Catching problems before they escalate
  • actionable_alerts: Focusing attention on what truly matters
  • continuous_monitoring: Ongoing surveillance for changes

Trends:

  • ml_based_anomaly_detection: Using isolation forests, autoencoders for detection
  • real_time_streaming_detection: Identifying anomalies as data arrives
  • contextual_learning: Systems that learn what's normal for each segment
  • automated_root_cause: AI suggesting why anomaly occurred
  • federated_anomaly_detection: Detecting anomalies across distributed data sources

Use_cases:

  • retail: "Store #47's sales 3.2σ below expected - investigate operational issue"
  • saas: "Enterprise customer's API usage spiked 10x - potential integration problem or abuse"
  • manufacturing: "Production line 3 defect rate jumped to 8% (control limit: 3%) - stop and inspect"
  • finance: "Transaction pattern for account XYZ anomalous - flag for fraud review"
  • healthcare: "Patient readmission rate 2.5σ above expected - quality of care concern"
  • marketing: "Campaign CTR 5x higher than baseline - investigate creative for scalability"
  • logistics: "Delivery times to region Z spiked - check for weather/route issues"
  • ecommerce: "Product return rate jumped from 2% to 12% - possible defective batch"