Analytics and Visualization of Big Data: Association Analysis

Association Analysis is a classification method identifying interesting relationships hidden between large data sets. These relationships can be expressed in form of association rules.

Association Rule

An association rule is a method trying to find interesting rules and relationships that happens often among the item-sets in a dataset (Williams, 2010). An association rule is expressed as X → Y, where X and Y are disjoint item-sets. For example, it is quite likely that if a person is buying coffee, will also buy coffee-cream. So, thanks to this method, interesting patterns can be found in a database. The strength of an association rule can be measured in terms of its support and confidence. Support gives idea about if a rule can be applied to a dataset, and if it is, how often. Confidence measures the significance of an inference made by a rule (Ding & Sundarraj, 2006).

The rules exceeding minimum confidence and minimum support specific threshold values chosen by a researcher, are identified as interesting rules. Lift is an extra but important measurement that shows the importance degree of the relationship or rule between X and Y.

Market Basket Analysis

A huge amount of data is collected on movements of clients shopping in supermarkets and retail sector. The most typical example of association rules is "market basket analysis" which is a modeling technique based on the idea that if one buy a certain group of items, then he/she is more likely to buy (or not to buy) another group of items. The discovery of this type of associations may provide important opportunity for market managers to develop more effective marketing strategies. For example, X% of customers buying sugar also buy eggs. This information can be found with association rule method.

Figure: (Han & Kamber, 2001)

1- Williams, G. (2010). Book: Data Mining Desktop Survival Guide by Graham Williams, 2010.

2- Ding, Q. & Sundarraj, G. (2006). “Association Rule Mining from XML Data", International Conference on Data Mining,

Las Vegas, Nevada, 2006, pp. 144-150.

3- Han, J. & Kamber, M. (2001). Data Mining. Morgan Kaufmann Publishers, San Francisco, CA

Analytics and Visualization of Big Data

Tuesday, March 5, 2013

Association Analysis

No comments:

Post a Comment