Analysis Of Breast Cancer Data Using Apriori Algorithm

Here we go for the famous apriori algorithms which is mostly used in data mining. The interesting part here is, we have applied it on breast cancer data set which is originally a classification problem statement.

From this we proposed an analysis of data and calculated association rules like support, lift, confidence,feature selection,graph etc.

Summary of the project goes here.....

1. creating data frame from breast cancer CSV file.

2. Applying transaction encoder on dataset. Transaction encoder find outs all the items which present in dataframe. And arranges all items as column name. if that item(column name) is present in perticular transaction then it will fill with "True" values otherwise

it will fill "False" values.

3. Applying Apriori algorithm with minimum support will give a frequent itemsets with their support value.

4. Now, we are calculating basic association rules like support,confidence and lift. Support is probability. Confidence is conditional probability.Lift is how much consequent are influenced by antecedents.Value below minimum threshold are are neglected. 

5.Plotting histogram between frequent item-set(disease factor's value) and Severity of disease in percentage.

6. Applying SelectKBest and Chi-Square test for feature selection.Selecting the top 10 features which are influencing the diagnosis feature as Maligant.

How does Apriori algorithm work?

Step 1: Create a frequency table of all the items k that occur in all the transactions n.

Step 2: for k in range(0,n)

Step 3: if(min_threshold==0.02) then select frequent item of k i.e. freq(k)

Step 4: Make possible pair(j,k) and with their frequency from the transactions n.

Step 5: Again make possible pairs(i,j,k) from previous pair(j,k) with their frequency.

Step 6: Neglect pairs of frequency below min_threshold.

Step 7: To create a set of 3 items another rule, called self-join is required.

 It says that from the item pairs OP, OB, PB and PM we look for two pairs with the identical first letter. Output will be a set of 3 item which customers are buying most frequently.

Code Link:

https://github.com/Krishna5996/Machine_Learning/blob/master/Breast_Cancer_Apriori.ipynb

To view or add a comment, sign in

Explore content categories