Association Rule Mining
Association rule mining is a fundamental technique in data mining focused on discovering relationships between items in transaction data. These rules reveal correlations that provide valuable insights for market basket analysis, recommendation systems, and business decision-making, as well as significant applications in computational biology for identifying gene associations and in population health for discovering disease co-occurrence patterns.
Association Rules
Association rules represent relationships between sets of items in transaction data. A rule takes the form "if A, then B" (written as A → B), where A is the antecedent (left-hand side) and B is the consequent (right-hand side). For example, in retail analysis, a rule {bread, butter} → {milk} suggests that customers who purchase bread and butter are likely to purchase milk as well.
Formal Definition
Let:
be the set of all items in the dataset be the set of all transactions, where each and
An association rule is an implication of the form
is called the antecedent (or left-hand side) is called the consequent (or right-hand side)
Mining Process
Let
Then, the set of all valid association rules (AR) can be defined as:
The process of association rule mining involves:
Finding all frequent itemsets
For each frequent itemset
, generate all non-empty subsets For each such subset
, form the rule if
Rule Evaluation Metrics
The main challenge in association rule mining is efficiently discovering meaningful rules from large datasets while filtering out weak or uninteresting patterns using various interestingness measures.
To evaluate the strength and significance of association rules, several key metrics are used:
Support
Support measures how frequently the itemset (A ∪ B) appears in the dataset.
Confidence
Confidence measures how often the rule is found to be true. It represents the conditional probability of finding B given that a transaction contains A.
Coverage
Coverage (sometimes called support of A) measures how often A appears in the dataset, regardless of B.
Lift
Lift measures how much more likely B is to be present when A is present, compared to when A is absent. It indicates the strength of association beyond random co-occurrence.