DAZL Documentation | Data Analytics A-to-Z Processing Language


Contents

basket

business analytics

slug: step-basket

Basket Step

Purpose

Performs market basket analysis (association rule mining) on transaction data to identify which items are frequently purchased together. Discovers meaningful relationships between products to inform cross-selling strategies, product placement, and customer behavior insights.

When to Use

  • Identify product affinities and cross-selling opportunities
  • Optimize product placement or website recommendations
  • Understand customer purchasing patterns
  • Design bundle offers or promotions
  • Improve inventory management based on co-purchase patterns
  • Analyze menu item relationships in food service
  • Detect common combinations in any transaction-based dataset

How It Works

  1. Takes transaction data with transaction IDs and item identifiers
  2. Groups items by transaction to create a set of baskets/transactions
  3. Applies the Apriori algorithm to discover frequent itemsets
  4. Generates association rules showing which items are purchased together
  5. Filters and ranks rules based on support, confidence, and lift metrics
  6. Provides detailed statistics about the discovered relationships

Parameters

Required

  • transactionId (string) - Column name containing transaction identifiers (default: transactionId)
  • itemColumn (string) - Column name containing item identifiers (default: item)

Optional

  • minSupport (float) - Minimum support threshold (0-1), percentage of transactions containing the itemset (default: 0.01)
  • minConfidence (float) - Minimum confidence threshold (0-1), probability of consequent given antecedent (default: 0.5)
  • minLift (float) - Minimum lift threshold, measure of rule strength beyond random chance (default: 1.0)
  • maxItemsetSize (int) - Maximum number of items in a rule (default: 3)

Association Rule Metrics

  • Support: Percentage of transactions containing all items in the rule
  • Confidence: Probability that consequent appears in transactions containing antecedent
  • Lift: Ratio of observed support to expected support if items were independent

Input Requirements

  • Transaction data must have at least two columns:
    • Transaction identifier column (e.g., order ID, basket ID)
    • Item identifier column (e.g., product ID, SKU, product name)
  • Data can be in denormalized format (multiple rows per transaction)
  • Each row should represent one item in a transaction

Output Columns

  • antecedent - Item or items purchased (the "if" part of the rule)
  • consequent - Item likely to be purchased with the antecedent (the "then" part)
  • support - Percentage of transactions containing both antecedent and consequent
  • confidence - Probability of consequent given antecedent
  • lift - Strength of association beyond random chance (>1 indicates positive association)
  • count - Number of transactions containing the entire rule

Extras

Detailed summary statistics stored in extras.basket:

  • totalTransactions - Number of transactions analyzed
  • uniqueItems - Number of distinct items across all transactions
  • avgItemsPerTransaction - Average basket size
  • totalRules - Number of association rules discovered
  • maxLift - Maximum lift value found
  • avgConfidence - Average confidence across all rules
  • parameters - Copy of the parameters used for analysis

Example Usage

Basic Retail Product Analysis

basket:
  transactionId: order_id
  itemColumn: product_name
  minSupport: 0.02
  minConfidence: 0.3
  minLift: 1.2

Restaurant Menu Item Analysis

basket:
  transactionId: ticket_number
  itemColumn: menu_item
  minSupport: 0.05
  minConfidence: 0.4
  minLift: 1.5
  maxItemsetSize: 4

Example Output

Input Data (Denormalized Format)

order_id product_name
1001 Bread
1001 Milk
1001 Eggs
1002 Milk
1002 Cereal
1003 Bread
1003 Butter
1003 Jam
1004 Bread
1004 Milk
1004 Butter

Output Data (Association Rules)

antecedent consequent support confidence lift count
Milk Bread 0.50 0.67 1.33 2
Butter Bread 0.50 1.00 2.00 2
Bread Milk 0.50 0.67 1.33 2
Bread, Butter Milk 0.25 0.50 1.00 1
Milk, Bread Butter 0.25 0.50 1.00 1

Interpretation Guide

  • High Confidence, High Lift: Strong associations - ideal for cross-selling
  • High Support, Low Lift: Common combinations but not necessarily associated
  • Low Support, High Lift: Niche but strong associations
  • Lift = 1: No association (items appear together by random chance)
  • Lift < 1: Negative association (items less likely to appear together)

Optimizing Results

  • Increase minSupport if too many rules are generated
  • Decrease minConfidence if too few rules are found
  • Adjust minLift to focus on stronger associations
  • Increase maxItemsetSize to discover more complex patterns

Related Documentation

  • cube step - Pre-aggregate data before basket analysis
  • filter step - Filter transaction data to specific segments
  • sort step - Sort association rules by metrics
  • chart step - Visualize association rules