DAZL Documentation | Data Analytics A-to-Z Processing Language

Quick Index Pages (1)

Welcome to DAZL

Recipes (24)

Topic Maps (18)

Examples (18)

Tutorials (6)

Reference (7)

Tuning Market Basket Parameters

tuning paramters

slug: tutorial-tuning-market-basket-parameters

The Four Key Parameters

1. minSupport - "How common must a pattern be?"

What it controls: Minimum frequency threshold for itemsets

Range: 0.0 to 1.0 (often 0.001 to 0.1 in practice)

Effect on results:

Too low (e.g., 0.001):
- Finds MANY patterns, including rare/spurious ones
- Slow performance (more itemsets to evaluate)
- More noise, harder to interpret
Too high (e.g., 0.5):
- Only finds very common patterns
- Misses interesting niche associations
- Fast performance
- Only "obvious" insights

Your data examples:

minSupport: 0.01  # 1% = at least 1/15 transactions
# Result: Found everything, including single-occurrence patterns

minSupport: 0.05  # 5% = at least 1/15 transactions (same threshold for your data)
# Result: Same as above due to small dataset

minSupport: 0.20  # 20% = at least 3/15 transactions
# Result: Would only find Milk, Bread, Butter patterns

Business interpretation:

Retail grocery: 0.001-0.01 (millions of transactions, find niche patterns)
Small boutique: 0.05-0.10 (fewer transactions, need more evidence)
Your test data: 0.05-0.10 is reasonable

2. minConfidence - "How reliable must the prediction be?"

What it controls: Minimum confidence threshold for rules

Range: 0.0 to 1.0 (often 0.5 to 0.8 in practice)

Effect on results:

Too low (e.g., 0.3):
- "If customer buys X, they buy Y 30% of the time" - weak recommendation
- Many unreliable rules
- High false positive rate
Too high (e.g., 0.95):
- Only nearly-perfect associations
- Very few rules
- Misses good-but-not-perfect patterns

Your data examples:

minConfidence: 0.5  # 50% or better
# Result: Butter→Milk (86%), Coffee→Cream (67%), PB→Jelly (100%)

minConfidence: 0.8  # 80% or better
# Result: Only the strongest rules like PB→Jelly (100%), Butter→Milk (86%)

minConfidence: 0.3  # 30% or better
# Result: Would include many weak, unreliable suggestions

Business interpretation:

Cross-sell recommendations: 0.6-0.8 (don't annoy customers with bad suggestions)
Store layout decisions: 0.4-0.6 (broader patterns acceptable)
Bundling discounts: 0.7-0.9 (need strong confidence)

3. minLift - "How special must the relationship be?"

What it controls: Minimum lift threshold for rules

Range: Usually 1.0+ (often 1.5 to 3.0 in practice)

Effect on results:

Low (e.g., 1.0):
- Includes all associations, even those expected by chance
- Gets "popular item + popular item" patterns
- More rules, less interesting
High (e.g., 3.0):
- Only surprising, strong associations
- Fewer rules, more actionable
- May miss moderate but useful patterns

Your data examples:

minLift: 1.0  # No special relationship required
# Result: Got Bread↔Milk↔Butter (all lift ≈ 1.0) - just popular items

minLift: 2.0  # At least 2× more likely than random
# Result: Filtered out staples, kept Eggs→Butter (2×), Coffee→Cream (5×), PB→Jelly (15×)

minLift: 5.0  # Very strong associations only
# Result: Only Coffee→Cream and PB→Jelly (the really special patterns)

Business interpretation:

Finding staple patterns: 1.0-1.5 (what commonly goes together)
Cross-sell opportunities: 2.0-3.0 (surprising but actionable)
Niche bundles: 3.0+ (perfect pairings like PB&J)

4. maxItemsetSize - "How complex can patterns be?"

What it controls: Maximum number of items in a rule

Range: Usually 2-5 (rarely higher)

Effect on results:

Small (e.g., 2):
- Only pairs: X → Y
- Fast computation
- Simple, easy to interpret
- Misses complex patterns
Large (e.g., 5):
- Complex rules: {W, X, Y} → {Z}
- Slow computation (combinatorial explosion)
- Harder to interpret and act on
- May find rare but interesting patterns

Your data examples:

maxItemsetSize: 2
# Only pairs: Milk→Bread, Coffee→Cream

maxItemsetSize: 3
# Your current setting
# Gets: {Milk, Bread}→Butter, {Coffee, Cream}→Sugar, {Peanut Butter, Jelly}→Bread

maxItemsetSize: 5
# Would get: {Milk, Bread, Butter, Eggs}→X (but computationally expensive)

Business interpretation:

Product recommendations: 2-3 (simple suggestions)
Bundle creation: 3-4 (meal deals, starter kits)
Complex behavior analysis: 4-5 (research, not operational)

How the Algorithm Uses These Parameters

Phase 1: Find Frequent Itemsets (uses minSupport)

Start with single items:
  Milk: 10/15 = 0.67 ✓ (above minSupport)
  Bread: 8/15 = 0.53 ✓
  Lemon: 1/15 = 0.067 ✓ (if minSupport = 0.05)

Generate pairs from frequent singles:
  {Milk, Bread}: 6/15 = 0.40 ✓
  {Milk, Lemon}: 0/15 = 0.00 ✗ (below minSupport, prune)

Generate triplets from frequent pairs:
  {Milk, Bread, Butter}: 4/15 = 0.267 ✓
  Stop at maxItemsetSize

minSupport acts as early pruning - if {Milk, Lemon} is infrequent, don't bother checking {Milk, Lemon, X} for any X.

Phase 2: Generate Rules (uses minConfidence and minLift)

From frequent itemset {Milk, Bread, Butter}:

Generate all possible rules:
  Milk → {Bread, Butter}
    Confidence: 4/10 = 0.40 ✗ (below minConfidence = 0.5)

  {Milk, Bread} → Butter
    Confidence: 4/6 = 0.67 ✓
    Lift: (4/15) / ((6/15) × (7/15)) = 0.267 / 0.187 = 1.43
    If minLift = 2.0, this would be rejected ✗
    If minLift = 1.0, this would be accepted ✓

Tuning Strategy: The Iterative Process

Step 1: Start Conservative

minSupport: 0.05      # Reasonably frequent
minConfidence: 0.5    # Better than random
minLift: 1.0          # See everything
maxItemsetSize: 3     # Keep it simple

Goal: See the full landscape of patterns

Step 2: Analyze Results

Too many rules? → Increase thresholds Too few rules? → Decrease thresholds All obvious patterns? → Increase minLift Missing rare but valuable items? → Decrease minSupport

Step 3: Focus Based on Business Goal

For cross-sell recommendations:

minSupport: 0.02      # At least 2% (not too rare)
minConfidence: 0.6    # Reliable suggestions
minLift: 2.0          # Meaningful associations
maxItemsetSize: 2     # Simple pairs

For understanding customer behavior:

minSupport: 0.01      # See more patterns
minConfidence: 0.4    # Lower bar
minLift: 1.5          # Moderate associations
maxItemsetSize: 4     # Complex patterns OK

For premium bundles:

minSupport: 0.05      # Must be somewhat common
minConfidence: 0.8    # Very reliable
minLift: 3.0          # Strong special relationship
maxItemsetSize: 3     # Bundle of 2-3 items

Parameter Interactions

The Support-Confidence Trade-off

High support + Low confidence:

Common items that sometimes go together
Example: Milk → Eggs (frequent, but not always)

Low support + High confidence:

Rare but perfect pairings
Example: Peanut Butter → Jelly (rare in your data, but 100% confidence)

Your "interesting finds" tuning captured this:

minSupport: 0.05      # Not too rare
minConfidence: 0.6    # Reliable enough
minLift: 2.0          # Must be special

This found patterns that are:

✓ Frequent enough to act on
✓ Reliable enough to recommend
✓ Special enough to be interesting

The Lift Filter is Your Friend

Lift separates:

Popular items appearing together (lift ≈ 1) → operational insights
Surprising associations (lift >> 1) → strategic insights

That's why minLift: 2.0 gave you such different, more actionable results!

Real-World Tuning Examples

Amazon-scale E-commerce

minSupport: 0.0001    # 0.01% - millions of transactions
minConfidence: 0.3    # Lower bar, personalization handles accuracy
minLift: 2.0          # Want surprising recommendations
maxItemsetSize: 5     # "Frequently bought together" bundles

Small Retail Store

minSupport: 0.10      # 10% - need strong evidence
minConfidence: 0.7    # High reliability for limited inventory
minLift: 1.5          # Moderate associations
maxItemsetSize: 3     # Simple bundles

Your Test Data Sweet Spot

minSupport: 0.05      # 5% = 1+ transactions out of 15
minConfidence: 0.6    # 60% reliability
minLift: 2.0          # 2× better than random
maxItemsetSize: 3     # Up to triplets

Practical Tuning Workflow

Run with defaults - see everything
Check rule count - too many/too few?
Review top rules - interesting or obvious?
Adjust minLift first - biggest impact on "interestingness"
Then adjust confidence - balance reliability vs coverage
Finally adjust support - if you need more/fewer patterns
Iterate - based on business feedback