Machine Learning Algorithm : Associated Rule (Part 6 of 12)

Abhay Kumar

Published Jul 8, 2016

+ Follow

Apriori :

The Apriori Algorithm is an influential algorithm for mining frequent itemsets for boolean association rules.

Key Concepts :

Frequent Itemsets : The sets of item which has minimum support (denoted by Lfor ith-Itemset).

Apriori Property: Any subset of frequent itemset must be frequent.
Join Operation : To find L, a set of candidate k-itemsets is generated by joining L with itself.

Find the frequent itemsets: the sets of items that have minimum support– A subset of a frequent itemset must also be a frequent itemset

i.e., if {AB} is a frequent itemset, both {A} and {B} should be a frequent itemset– Iteratively find frequent itemsets with cardinality from 1 to k (k-itemset)

Use the frequent itemsets to generate association rules.
Join Step : C is generated by joining L with itself
Prune Step : Any (k-1)-itemset that is not frequent cannot be a subset of a frequent k-itemset

Pseudo-code:

C: Candidate itemset of size k

L: frequent itemset of size k

L= {frequent items};for(k 1; L!=∅; k++)

do begin

C= candidates generated from L;

for each transaction t in database do increment the count of all candidates in Cthat are contained in t

L= candidates in Cwith min_support

end

return ∪L;

Ref. : http://software.ucv.ro/~cmihaescu/ro/teaching/AIR/docs/Lab8-Apriori.pdf

Eclat algorithm :

Eclat (alt. ECLAT, stands for Equivalence Class Transformation) is a depth-first search algorithm using set intersection. It is a naturally elegant algorithm suitable for both sequential as well as parallel execution with locality enhancing properties. It was first introduced by Zaki, Parthasarathy, Li and Ogihara in a series of papers written in 1997. Good tool for data mining.

Mohammed Javeed Zaki, Srinivasan Parthasarathy, M. Ogihara, Wei Li: New Algorithms for Fast Discovery of Association Rules. KDD 1997.

Ref : http://www.ijcsonline.com/IJCS/IJCS_2014_0103002.pdf

Software : http://www.borgelt.net/eclat.html

FP-Growth :

FP-Growth allows frequent itemset discovery without candidate itemset generation. Two step approach:FP-Tree is constructed using 2 passes over the data-set:
- Step 1: Build a compact data structure called the FP-tree
  - Built using 2 passes over the data-set.
- Step 2: Extracts frequent itemsets directly from the FP-tree

Pass 1:

Scan data and find support for each item.
Discard infrequent items.
Sort frequent items in decreasing order based on their support.

Use this order when building the FP-Tree, so common prefixes can be shared.

Pass 2:

Nodes correspond to items and have a counter

FP-Growth reads 1 transaction at a time and maps it to a path
Fixed order is used, so paths can overlap when transactions share items (when they have the same prfix ).
- In this case, counters are incremented
Pointers are maintained between nodes containing the same item, creating singly linked lists (dotted lines)Frequent itemsets extracted from the FP-Tree.
- The more paths that overlap, the higher the compression. FP-tree may fit in memory.

To view or add a comment, sign in

Apriori :

The Apriori Algorithm is an influential algorithm for mining frequent itemsets for boolean association rules.

Key Concepts :

Frequent Itemsets : The sets of item which has minimum support (denoted by Lfor ith-Itemset).

Apriori Property: Any subset of frequent itemset must be frequent.

Join Operation : To find L, a set of candidate k-itemsets is generated by joining L with itself.

Find the frequent itemsets: the sets of items that have minimum support– A subset of a frequent itemset must also be a frequent itemset

i.e., if {AB} is a frequent itemset, both {A} and {B} should be a frequent itemset– Iteratively find frequent itemsets with cardinality from 1 to k (k-itemset)

Use the frequent itemsets to generate association rules.

Join Step : C is generated by joining L with itself

Prune Step : Any (k-1)-itemset that is not frequent cannot be a subset of a frequent k-itemset

Pseudo-code:

C: Candidate itemset of size k

L: frequent itemset of size k

L= {frequent items};for(k 1; L!=∅; k++)

do begin

C= candidates generated from L;

for each transaction t in database do increment the count of all candidates in Cthat are contained in t

L= candidates in Cwith min_support

end

return ∪L;

Eclat algorithm :

Ref : http://www.ijcsonline.com/IJCS/IJCS_2014_0103002.pdf

Software : http://www.borgelt.net/eclat.html

FP-Growth :

FP-Growth allows frequent itemset discovery without candidate itemset generation. Two step approach:FP-Tree is constructed using 2 passes over the data-set:

Pass 1:

Scan data and find support for each item.

Discard infrequent items.

Sort frequent items in decreasing order based on their support.

Use this order when building the FP-Tree, so common prefixes can be shared.

Pass 2:

Nodes correspond to items and have a counter

FP-Growth reads 1 transaction at a time and maps it to a path

In this case, counters are incremented

Pointers are maintained between nodes containing the same item, creating singly linked lists (dotted lines)Frequent itemsets extracted from the FP-Tree.

The more paths that overlap, the higher the compression. FP-tree may fit in memory.

More articles by Abhay Kumar

SAS Middle Tier Web Process

Machine Learning Algorithm : decision tree - Part 9 of 12

Machine Learning Algorithm : bayesian - Part 8 of 12

Machine Learning Algorithm : ensemble (part 7 of 12)

Machine Learning Algorithm - Deep Learning (Part 5 of 12)

Machine Learning Algorithm - Dimensionality Reduction (Part 4 of 12)

MACHINE LEARNING ALGORITHMS - Instance Based (CAKE BASED, MEMORY BASED) Part 3 of 12

Machine Learning Algorithm - Regularization Part 2 of 12

MACHINE LEARNING ALGORITHMS - Regression - Part 1 of 12

Big Data A Real Challenge

Explore content categories