DAZL Documentation | Data Analytics A-to-Z Processing Language


Contents

pareto

business analytics

slug: step-pareto

Pareto Step

Purpose

Performs Pareto (80/20) analysis on data to identify the vital few segments that contribute disproportionately to the total. Calculates concentration metrics, inequality indicators, and ABC categorization for strategic prioritization.

When to Use

  • Identify high-impact categories requiring focus
  • Prioritize resources for maximum return
  • Segment customers by value contribution
  • Analyze product portfolio performance
  • Identify sales concentration risks
  • Optimize inventory management
  • Evaluate performance distribution across regions, channels, or categories

How It Works

  1. Takes input data (often from a cube step) with a measure column
  2. Sorts data by the specified measure in descending order
  3. Calculates cumulative values and percentages for each segment
  4. Assigns ABC categories based on cumulative contribution
  5. Computes concentration metrics, including the Gini coefficient
  6. Identifies Pareto thresholds and "vital few" segments
  7. Provides comprehensive statistics for distribution analysis

Parameters

Required

  • measure (string) - Base column name to analyze (e.g., "revenue")
  • metric (string) - Metric to use from cube data (e.g., "sum") - combined as "measure.metric"

Optional

  • level (integer) - Analysis level for hierarchical data (if null, uses all data)
  • paretoThresholds (array) - Percentage thresholds for Pareto analysis (default: [80, 90, 95])
  • abcBreakpoints (array) - Percentage thresholds for ABC categorization (default: [80, 95])
  • calculateGini (boolean) - Whether to calculate Gini coefficient (default: true)
  • includeRankPercentile (boolean) - Whether to include rank percentile column (default: true)

Pareto Metrics Explained

  • Pareto Principle: The observation that roughly 80% of effects come from 20% of causes
  • ABC Analysis: Categorization method where:
    • A items: Top contributors (typically ~80% of value from ~20% of items)
    • B items: Middle contributors (typically ~15% of value from ~30% of items)
    • C items: Low contributors (typically ~5% of value from ~50% of items)
  • Gini Coefficient: Measure of inequality (0 = perfect equality, 1 = maximum inequality)
  • Concentration Ratio: Percentage of total value from the top X% of segments

Input Requirements

  • Data must contain the measure column specified in the format measure.metric
  • For hierarchical data (e.g., from cube step), a 'level' column is expected
  • Data should ideally be pre-aggregated (e.g., from a cube step)

Output Columns

All original columns plus:

  • rank - Ordinal position after sorting (1 = highest value)
  • rankPercentile - Percentile rank (0-100%)
  • segmentValue - Value of this segment (from measure.metric)
  • segmentPct - Percentage of total contributed by this segment
  • cumulativeValue - Running total value up to this segment
  • cumulativePct - Running percentage of total up to this segment
  • abcCategory - A, B, or C category based on cumulative contribution
  • paretoFlag - Boolean indicating if segment is part of the "vital few"

Extras (Statistical Summary)

Detailed statistics stored in extras.stats.pareto:

  • totalValue - Sum of all segment values
  • totalSegments - Number of segments analyzed
  • giniCoefficient - Measure of inequality distribution
  • paretoPoints - Segments needed to reach each threshold:
    • count - Number of segments
    • pctOfTotal - Percentage of total segments
  • categoryCounts - Count of segments in each ABC category
  • categoryValues - Sum of values in each ABC category
  • concentrationRatios - Percentage of value from top 10%, 25%, and 50% of segments

Example Usage

Basic Pareto Analysis

pareto:
  measure: revenue
  metric: sum

Custom ABC Categorization

pareto:
  measure: profit
  metric: sum
  level: 1
  abcBreakpoints: [70, 90]
  paretoThresholds: [70, 80, 90, 95]

Example Output

Input Data (From Cube Step)

level region product_category revenue.sum revenue.pctSum
1 North All 452500 50.00%
1 South All 453000 50.00%
2 North Electronics 270000 29.82%
2 North Furniture 182500 20.15%
2 South Electronics 216000 23.85%
2 South Furniture 237000 26.17%

Output Data (Pareto Analysis at Level 2)

level region product_category revenue.sum rank segmentPct cumulativePct abcCategory paretoFlag
2 North Electronics 270000 1 29.82 29.82 A true
2 South Furniture 237000 2 26.17 55.99 A true
2 South Electronics 216000 3 23.85 79.84 A true
2 North Furniture 182500 4 20.15 100.00 B false

Extras (Statistical Summary)

extras.stats.pareto = {
  "level": 2,
  "measure": "revenue",
  "metric": "sum",
  "totalValue": 905500,
  "totalSegments": 4,
  "giniCoefficient": 0.0498,
  "paretoPoints": {
    "80": {
      "count": 3,
      "pctOfTotal": 75.00
    },
    "90": {
      "count": 4,
      "pctOfTotal": 100.00
    },
    "95": {
      "count": 4,
      "pctOfTotal": 100.00
    }
  },
  "categoryCounts": {
    "A": 3,
    "B": 1,
    "C": 0
  },
  "categoryValues": {
    "A": 723000,
    "B": 182500,
    "C": 0
  },
  "concentrationRatios": {
    "top10PctShare": 29.82,
    "top25PctShare": 55.99,
    "top50PctShare": 79.84
  }
}

Insights from Pareto Analysis

  • Focus Areas: Top segments (A category) that drive the majority of value
  • Inequality Assessment: Gini coefficient showing distribution fairness
  • Resource Allocation: Guide for where to invest time and resources
  • Risk Exposure: Concentration ratios revealing dependency on top segments
  • Efficiency Opportunities: Identifying underperforming segments

Related Documentation

  • cube step - Create hierarchical data for Pareto analysis
  • index step - Scores segments by performance indices
  • contribution step - how much did each value of a dimension contribute to delta between two periods