Machine Learning Algorithm : Associated Rule (Part 6 of 12)

Apriori :

 

The Apriori Algorithm is an influential algorithm for mining frequent itemsets for boolean association rules.

 

 Key Concepts :

 

Frequent Itemsets : The sets of item which has minimum support (denoted by Lfor   ith-Itemset).

 

  • Apriori Property: Any subset of frequent itemset must be frequent.

  • Join Operation : To find L, a set of candidate k-itemsets is generated by joining L with itself.

 

Find the frequent itemsets: the sets of items that have minimum support– A subset of a frequent itemset must also be a frequent itemset

 

  • i.e., if {AB} is a frequent itemset, both {A} and {B} should be a frequent itemset– Iteratively find frequent itemsets with cardinality from 1 to k (k-itemset)

 

  • Use the frequent itemsets to generate association rules.

  • Join Step : C is generated by joining L with itself

  • Prune Step : Any (k-1)-itemset that is not frequent cannot be a subset of a frequent k-itemset

 

Pseudo-code:

 

C: Candidate itemset of size k

 

L: frequent itemset of size k

 

L= {frequent items};for(k 1; L!=∅; k++)

 

do begin

 

C= candidates generated from L;

 

for each transaction t in database do increment the count of all candidates in Cthat are contained in t

 

L= candidates in Cwith min_support

 

end

 

return ∪L;

Ref. : http://software.ucv.ro/~cmihaescu/ro/teaching/AIR/docs/Lab8-Apriori.pdf

Eclat algorithm :

Eclat (alt. ECLAT, stands for Equivalence Class Transformation) is a depth-first search algorithm using set intersection. It is a naturally elegant algorithm suitable for both sequential as well as parallel execution with locality enhancing properties. It was first introduced by Zaki, Parthasarathy, Li and Ogihara in a series of papers written in 1997. Good tool for data mining.

Mohammed Javeed Zaki, Srinivasan Parthasarathy, M. Ogihara, Wei Li: New Algorithms for Fast Discovery of Association Rules. KDD 1997.

Ref : http://www.ijcsonline.com/IJCS/IJCS_2014_0103002.pdf

Software :  http://www.borgelt.net/eclat.html

 

FP-Growth :

  • FP-Growth allows frequent itemset discovery without candidate itemset generation. Two step approach:FP-Tree is constructed using 2 passes over the data-set:

    • Step 1: Build a compact data structure called the FP-tree
      • Built using 2 passes over the data-set.
    • Step 2: Extracts frequent itemsets directly from the FP-tree

Pass 1:

  • Scan data and find support for each item.

  • Discard infrequent items.

  • Sort frequent items in decreasing order based on their support.

   Use this order when building the FP-Tree, so common prefixes can be shared.

Pass 2:

Nodes correspond to items and have a counter

  1. FP-Growth reads 1 transaction at a time and maps it to a path

  2.  

     

    Fixed order is used, so paths can overlap when transactions share items (when they have the same prfix ).
    • In this case, counters are incremented

  3. Pointers are maintained between nodes containing the same item, creating singly linked lists (dotted lines)Frequent itemsets extracted from the FP-Tree.

    • The more paths that overlap, the higher the compression. FP-tree may fit in memory.

To view or add a comment, sign in

More articles by Abhay Kumar

Explore content categories