tuning paramters
slug: tutorial-tuning-market-basket-parametersWhat it controls: Minimum frequency threshold for itemsets
Range: 0.0 to 1.0 (often 0.001 to 0.1 in practice)
Effect on results:
Too low (e.g., 0.001):
Too high (e.g., 0.5):
Your data examples:
minSupport: 0.01 # 1% = at least 1/15 transactions
# Result: Found everything, including single-occurrence patterns
minSupport: 0.05 # 5% = at least 1/15 transactions (same threshold for your data)
# Result: Same as above due to small dataset
minSupport: 0.20 # 20% = at least 3/15 transactions
# Result: Would only find Milk, Bread, Butter patterns
Business interpretation:
What it controls: Minimum confidence threshold for rules
Range: 0.0 to 1.0 (often 0.5 to 0.8 in practice)
Effect on results:
Too low (e.g., 0.3):
Too high (e.g., 0.95):
Your data examples:
minConfidence: 0.5 # 50% or better
# Result: Butter→Milk (86%), Coffee→Cream (67%), PB→Jelly (100%)
minConfidence: 0.8 # 80% or better
# Result: Only the strongest rules like PB→Jelly (100%), Butter→Milk (86%)
minConfidence: 0.3 # 30% or better
# Result: Would include many weak, unreliable suggestions
Business interpretation:
What it controls: Minimum lift threshold for rules
Range: Usually 1.0+ (often 1.5 to 3.0 in practice)
Effect on results:
Low (e.g., 1.0):
High (e.g., 3.0):
Your data examples:
minLift: 1.0 # No special relationship required
# Result: Got Bread↔Milk↔Butter (all lift ≈ 1.0) - just popular items
minLift: 2.0 # At least 2× more likely than random
# Result: Filtered out staples, kept Eggs→Butter (2×), Coffee→Cream (5×), PB→Jelly (15×)
minLift: 5.0 # Very strong associations only
# Result: Only Coffee→Cream and PB→Jelly (the really special patterns)
Business interpretation:
What it controls: Maximum number of items in a rule
Range: Usually 2-5 (rarely higher)
Effect on results:
Small (e.g., 2):
Large (e.g., 5):
Your data examples:
maxItemsetSize: 2
# Only pairs: Milk→Bread, Coffee→Cream
maxItemsetSize: 3
# Your current setting
# Gets: {Milk, Bread}→Butter, {Coffee, Cream}→Sugar, {Peanut Butter, Jelly}→Bread
maxItemsetSize: 5
# Would get: {Milk, Bread, Butter, Eggs}→X (but computationally expensive)
Business interpretation:
Start with single items:
Milk: 10/15 = 0.67 ✓ (above minSupport)
Bread: 8/15 = 0.53 ✓
Lemon: 1/15 = 0.067 ✓ (if minSupport = 0.05)
Generate pairs from frequent singles:
{Milk, Bread}: 6/15 = 0.40 ✓
{Milk, Lemon}: 0/15 = 0.00 ✗ (below minSupport, prune)
Generate triplets from frequent pairs:
{Milk, Bread, Butter}: 4/15 = 0.267 ✓
Stop at maxItemsetSize
minSupport acts as early pruning - if {Milk, Lemon} is infrequent, don't bother checking {Milk, Lemon, X} for any X.
From frequent itemset {Milk, Bread, Butter}:
Generate all possible rules:
Milk → {Bread, Butter}
Confidence: 4/10 = 0.40 ✗ (below minConfidence = 0.5)
{Milk, Bread} → Butter
Confidence: 4/6 = 0.67 ✓
Lift: (4/15) / ((6/15) × (7/15)) = 0.267 / 0.187 = 1.43
If minLift = 2.0, this would be rejected ✗
If minLift = 1.0, this would be accepted ✓
minSupport: 0.05 # Reasonably frequent
minConfidence: 0.5 # Better than random
minLift: 1.0 # See everything
maxItemsetSize: 3 # Keep it simple
Goal: See the full landscape of patterns
Too many rules? → Increase thresholds Too few rules? → Decrease thresholds All obvious patterns? → Increase minLift Missing rare but valuable items? → Decrease minSupport
For cross-sell recommendations:
minSupport: 0.02 # At least 2% (not too rare)
minConfidence: 0.6 # Reliable suggestions
minLift: 2.0 # Meaningful associations
maxItemsetSize: 2 # Simple pairs
For understanding customer behavior:
minSupport: 0.01 # See more patterns
minConfidence: 0.4 # Lower bar
minLift: 1.5 # Moderate associations
maxItemsetSize: 4 # Complex patterns OK
For premium bundles:
minSupport: 0.05 # Must be somewhat common
minConfidence: 0.8 # Very reliable
minLift: 3.0 # Strong special relationship
maxItemsetSize: 3 # Bundle of 2-3 items
High support + Low confidence:
Low support + High confidence:
Your "interesting finds" tuning captured this:
minSupport: 0.05 # Not too rare
minConfidence: 0.6 # Reliable enough
minLift: 2.0 # Must be special
This found patterns that are:
Lift separates:
That's why minLift: 2.0 gave you such different, more actionable results!
minSupport: 0.0001 # 0.01% - millions of transactions
minConfidence: 0.3 # Lower bar, personalization handles accuracy
minLift: 2.0 # Want surprising recommendations
maxItemsetSize: 5 # "Frequently bought together" bundles
minSupport: 0.10 # 10% - need strong evidence
minConfidence: 0.7 # High reliability for limited inventory
minLift: 1.5 # Moderate associations
maxItemsetSize: 3 # Simple bundles
minSupport: 0.05 # 5% = 1+ transactions out of 15
minConfidence: 0.6 # 60% reliability
minLift: 2.0 # 2× better than random
maxItemsetSize: 3 # Up to triplets