Friday, March 15, 2013

Apriori Algorithm


Apriori algorithm is a popular algorithm used to find the association rules. Apriori algorithm consists of repeating steps based on a criterion. The algorithm uses the minimum support value to create a set of common repetitive elements as the criterion. The minimum support level depends on the context (Agrawal, 1994). If an item-set has a frequency larger than minimum support, then it is called as a frequent  item-set. In addition, any subset if a frequent item-set is also a frequent item-set (Jin, 2003). See the following example (see Figure) (Zaiane, 2001). The database of transactions consist of the sets {1,3,4},{2,3,5},{1,2,3,5},{2,5}. Each number can be thought as a product or a term in a concrete experiment. The first step of Apriori is to count up the frequency (support) of each number (item) separately. We appointed the minimum support level as “1” for this example. Therefore, {4} is not frequent. So, we remove {4} from the first candidate item-set C1. The next step is to generate C2  that is the item-set of all 2-pairs of the frequent items. In the same way, we remove {1,2}, {1,5} whose frequencies are “1”. And then, we generate the tree of all possible sets. The third candidate item-set is {{2,3,5}}. Therefore, because of {2,3,5}’s support is 2, the last frequent item-set L3 is {{2,3,5}}. As a note, in the last scan, we did not include {1,2,3}, {1,2,5} and {1,3,5} in the C3 because in the previous scan we identified {1,2}, {1,5} items as  infrequent, and  {1,2}{1,2,3}, {1,2}{1,2,5}, {1,5}{1,2,5}, and {1,5}{1,3,5}. Now we can identify a association rule between the items of the last item-set.
                     
Where D is transactional database, Ck is candidate item-set of size k, and Lk is frequent item-set of size k.



1-    Agrawal, R. and Srikant, R.. Fast algorithms for mining association rules in large databases. Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, pages 487-499, Santiago, Chile, September 1994.
2-    Zaiane, O.R., University of Alberta. (2001). Web Technologies and Applications, Lecture Notes, Winter 2001.
3-   Jin, R., McCallen, S., Breitbart, Y., Fuhry, D. & Wang D., Department of Computer Science, Kent State University. Estimating the Number of Frequent Itemsets in a Large Database.

5 comments: