Association Rules in Data Mining

Hello, Connections :)

I am Priyanka Thota, and I'm here to share my view regarding the association rules in data mining with this article, as a part of my peer-mentoring in the course - DATA WAREHOUSE & MINING. Hope it helps!

Happy Reading!!

INTRO:

Association rules are just like "if-then" statements, where they help to get the probability of relationships between the data items, within large data sets present across various types of databases. The act of using association rules is sometimes referred to as "association rule mining" or "mining associations."

Association rule mining has a number of applications and is widely used to help discover sales correlations in transactional data or in medical data sets.

The Contents covered in this article are:

1) How association rules are used?

2) Some real-world use cases for association rules.

3) How do association rules work?

4) How can we measure the effectiveness of association rules?

5) Algorithms that use association rules.

6) Uses of association rules in data mining.

7) A classic example of association rules in data mining.

8) How did it start at the beginning? (History).

======================================================

1) How Association rules are used?

In data science, association rules are used to find correlations and co-occurrences between data sets. They are ideally used to explain patterns in data from seemingly independent information repositories, such as relational databases and transactional databases.

Note: It is common to use the terms correlation and association interchangeably. Technically, association refers to any relationship between two variables, whereas correlation is often used to refer only to a linear relationship between two variables.

2)Some real-world use cases for association rules.

Entertainment - Services like Netflix, Amazon Prime, Youtube, Spotify, and many more can use association rules to fuel their content recommendation engines. Machine learning models analyze past user behavior data for frequent patterns, develop association rules and use those rules to recommend content that a user is likely to engage with or organize content in a way that is likely to put the most interesting content for a given user first.
Retail - Retailers can collect data about purchasing patterns, recording purchase data as item barcodes are scanned by point-of-sale systems. Machine learning models can look for co-occurrence in this data to determine which products are most likely to be purchased together. The retailer can then adjust marketing and sales strategy to take advantage of this information.
Medicine - Doctors can use association rules to help diagnose patients. There are many variables to consider when making a diagnosis, as many diseases share symptoms. By using association rules and machine learning-fueled data analysis, doctors can determine the conditional probability of a given illness by comparing symptom relationships in the data from past cases. As new diagnoses get made, the machine learning model can adapt the rules to reflect the updated data.
User experience (UX) design - Developers can collect data on how consumers use a website they create. They can then use associations in the data to optimize the website user interface -- by analyzing where users tend to click and what maximizes the chance that they engage with a call to action, for example.

3) How do association rules work?

Association rule mining, at a basic level, involves the use of machine learning models to analyze data for patterns, or co-occurrences, in a database. It identifies frequent if-then associations, which themselves are the association rules.

An association rule has two parts: an antecedent (if) and a consequent (then). An antecedent is an item found within the data. A consequent is an item found in combination with the antecedent.

Association rules are created by searching data for frequent if-then patterns and using the criteria support and confidence to identify the most important relationships. Support is an indication of how frequently the items appear in the data. Confidence indicates the number of times the if-then statements are found true. A third metric, called lift, can be used to compare confidence with expected confidence, or how many times an if-then statement is expected to be found true.

Association rules are calculated from itemsets, which are made up of two or more items. If rules are built from analyzing all the possible itemsets, there could be so many rules that the rules hold little meaning. With that, association rules are typically created from rules well-represented in data.

4) How can we measure the effectiveness of association rules?

The strength of a given association rule is measured by two main parameters: support and confidence. Support refers to how often a given rule appears in the database being mined. Confidence refers to the number of times a given rule turns out to be true in practice. A rule may show a strong correlation in a data set because it appears very often but may occur far less when applied. This would be a case of high support, but low confidence.

Conversely, a rule might not particularly stand out in a data set, but continued analysis shows that it occurs very frequently. This would be a case of high confidence and low support. Using these measures helps analysts separate causation from correlation, and allows them to properly value a given rule.

A third value parameter, known as the lift value, is the ratio of confidence to support. If the lift value is a negative value, then there is a negative correlation between data points. If the value is positive, there is a positive correlation, and if the ratio equals 1, then there is no correlation.

5) Algorithms that use association rules.

Popular algorithms that use association rules include AIS, SETM, Apriori and variations of the latter.

The AIS algorithm makes multiple passes over the entire database. During each pass, it scans all transactions. In the first pass, it counts the support of individual items and determines which of them are large or frequent in the database. Large itemsets of each pass are extended to generate candidate itemsets. After scanning a transaction, the common itemsets between large itemsets of the previous pass and items of this transaction are determined. This algorithm was targeted to discover qualitative rules. This technique is limited to only one item in the consequent.

Similar to the AIS algorithm, the SETM algorithm makes multiple passes over the database. In the first pass, it counts the support of individual items and determines which of them are large or frequent in the database. Then, it generates the candidate itemsets by extending large itemsets of the previous pass.

APRIORI It is by far the most well-known association rule algorithm. The fundamental differences of this algorithm from the AIS and SETM algorithms are the way of generating candidate itemsets and the selection of candidate itemsets for counting. The Apriori generates the candidate itemsets by joining the large itemsets of the previous pass and deleting those subsets which are small in the previous pass without considering the transactions in the database. By only considering large itemsets of the previous pass, the number of candidate large itemsets is significantly reduced.

6) Uses of association rules in data mining.

In data mining, association rules are useful for analyzing and predicting customer behavior. They play an important part in customer analytics, market basket analysis, product clustering, catalog design, and store layout.
Programmers use association rules to build programs that are capable of machine learning.

7) A classic example of association rules in data mining.

On Friday afternoons, young American males who buy diapers (nappies) also have a predisposition to buy beer.

This anecdote became popular as an example of how unexpected association rules might be found from everyday data.

The relationship between diapers and beers. The example, which seems to be fictional, claims that men who go to a store to buy diapers are also likely to buy beer. Data that would point to that might look like this:

A supermarket has 200,000 customer transactions. About 4,000 transactions, or about 2% of the total number of transactions, include the purchase of diapers. About 5,500 transactions (2.75%) include the purchase of beer. Of those, about 3,500 transactions, 1.75%, include both the purchase of diapers and beer. Based on the percentages, that large number should be much lower. However, the fact that about 87.5% of diaper purchases include the purchase of beer indicates a link between diapers and beer.

8) How did it start at the beginning? (History).

While the concepts behind association rules can be traced back earlier, association rule mining was defined in the 1990s, when computer scientists Rakesh Agrawal, Tomasz Imieliński, and Arun Swami developed an algorithm-based way to find relationships between items using point-of-sale (POS) systems. Applying the algorithms to supermarkets, the scientists were able to discover links between different items purchased, called association rules, and ultimately use that information to predict the likelihood of different products being purchased together.

For retailers, association rule mining offered a way to better understand customer purchase behaviors. Because of its retail origins, association rule mining is often referred to as market basket analysis.

As advances in data science, AI, and machine learning, have occurred since the original use case for association rules -- and more devices generate data -- association rules can be used in a wider breadth of use cases. More data is being generated, meaning more applications for association rules. AI and machine learning allow for larger and more complex data sets to be analyzed and mined for association rules.

====================================================

Here are some of the referred links by me which are interesting and easy to understand as well.

An interesting blog which I found on the "medium.com" website.


And here is the best platform to learn from -"geeksforgeeks.org", it really helps us in building the basic foundation!

And at last a short video to get an visual understanding regarding association rule mining of apriori algorithm.

Association Rules in Data Mining

Priyanka Thota

Recommended by LinkedIn

More articles by Priyanka Thota

Others also viewed

Understanding Data Mining: Process, Techniques, Benefits, and How it Works

Data mining process

How about mining WhatsApp?

Operational Data Mining for better decision-making (Part 2 )

Is "Data Mining" Actually a Misnomer? ⛏️💎 (And Other Concepts Explained Simply)

Unearth Hidden Treasures: Mastering The Art Of Data Mining

Data Mining - What, Why and How - Part 2

What Is Data Mining? How It Works, Benefits, Techniques, and Examples

3 Styles of Data Mining

Explore content categories