Machine Learning Algorithm : Associated Rule (Part 6 of 12)
Apriori :
The Apriori Algorithm is an influential algorithm for mining frequent itemsets for boolean association rules.
Key Concepts :
Frequent Itemsets : The sets of item which has minimum support (denoted by Lfor ith-Itemset).
-
Apriori Property: Any subset of frequent itemset must be frequent.
-
Join Operation : To find L, a set of candidate k-itemsets is generated by joining L with itself.
Find the frequent itemsets: the sets of items that have minimum support– A subset of a frequent itemset must also be a frequent itemset
-
i.e., if {AB} is a frequent itemset, both {A} and {B} should be a frequent itemset– Iteratively find frequent itemsets with cardinality from 1 to k (k-itemset)
-
Use the frequent itemsets to generate association rules.
-
Join Step : C is generated by joining L with itself
-
Prune Step : Any (k-1)-itemset that is not frequent cannot be a subset of a frequent k-itemset
Pseudo-code:
C: Candidate itemset of size k
L: frequent itemset of size k
L= {frequent items};for(k 1; L!=∅; k++)
do begin
C= candidates generated from L;
for each transaction t in database do increment the count of all candidates in Cthat are contained in t
L= candidates in Cwith min_support
end
return ∪L;
Ref. : http://software.ucv.ro/~cmihaescu/ro/teaching/AIR/docs/Lab8-Apriori.pdf
Eclat algorithm :
Eclat (alt. ECLAT, stands for Equivalence Class Transformation) is a depth-first search algorithm using set intersection. It is a naturally elegant algorithm suitable for both sequential as well as parallel execution with locality enhancing properties. It was first introduced by Zaki, Parthasarathy, Li and Ogihara in a series of papers written in 1997. Good tool for data mining.
Mohammed Javeed Zaki, Srinivasan Parthasarathy, M. Ogihara, Wei Li: New Algorithms for Fast Discovery of Association Rules. KDD 1997.
Ref : http://www.ijcsonline.com/IJCS/IJCS_2014_0103002.pdf
Software : http://www.borgelt.net/eclat.html
FP-Growth :
-
FP-Growth allows frequent itemset discovery without candidate itemset generation. Two step approach:FP-Tree is constructed using 2 passes over the data-set:
- Step 1: Build a compact data structure called the FP-tree
- Built using 2 passes over the data-set.
- Step 2: Extracts frequent itemsets directly from the FP-tree
- Step 1: Build a compact data structure called the FP-tree
Pass 1:
-
Scan data and find support for each item.
-
Discard infrequent items.
-
Sort frequent items in decreasing order based on their support.
Use this order when building the FP-Tree, so common prefixes can be shared.
Pass 2:
Nodes correspond to items and have a counter
-
FP-Growth reads 1 transaction at a time and maps it to a path
-
Fixed order is used, so paths can overlap when transactions share items (when they have the same prfix ).-
In this case, counters are incremented
-
-
Pointers are maintained between nodes containing the same item, creating singly linked lists (dotted lines)Frequent itemsets extracted from the FP-Tree.
-
The more paths that overlap, the higher the compression. FP-tree may fit in memory.
-