Analytics and Visualization of Big Data: Apriori Algorithm

Friday, March 15, 2013

Apriori Algorithm

Apriori algorithm is a popular algorithm used to find the association rules. Apriori algorithm consists of repeating steps based on a criterion. The algorithm uses the minimum support value to create a set of common repetitive elements as the criterion. The minimum support level depends on the context (Agrawal, 1994). If an item-set has a frequency larger than minimum support, then it is called as a frequent item-set. In addition, any subset if a frequent item-set is also a frequent item-set (Jin, 2003). See the following example (see Figure) (Zaiane, 2001). The database of transactions consist of the sets {1,3,4},{2,3,5},{1,2,3,5},{2,5}. Each number can be thought as a product or a term in a concrete experiment. The first step of Apriori is to count up the frequency (support) of each number (item) separately. We appointed the minimum support level as “1” for this example. Therefore, {4} is not frequent. So, we remove {4} from the first candidate item-set C1. The next step is to generate C2 that is the item-set of all 2-pairs of the frequent items. In the same way, we remove {1,2}, {1,5} whose frequencies are “1”. And then, we generate the tree of all possible sets. The third candidate item-set is {{2,3,5}}. Therefore, because of {2,3,5}’s support is 2, the last frequent item-set L3 is {{2,3,5}}. As a note, in the last scan, we did not include {1,2,3}, {1,2,5} and {1,3,5} in the C3 because in the previous scan we identified {1,2}, {1,5} items as infrequent, and {1,2}⊂{1,2,3}, {1,2}⊂{1,2,5}, {1,5}⊂{1,2,5}, and {1,5}⊂{1,3,5}. Now we can identify a association rule between the items of the last item-set.

Where D is transactional database, Ck is candidate item-set of size k, and Lk is frequent item-set of size k.

1- Agrawal, R. and Srikant, R.. Fast algorithms for mining association rules in large databases. Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, pages 487-499, Santiago, Chile, September 1994.

2- Zaiane, O.R., University of Alberta. (2001). Web Technologies and Applications, Lecture Notes, Winter 2001.

http://webdocs.cs.ualberta.ca/~zaiane/courses/cmput499/slides/Lect10

3- Jin, R., McCallen, S., Breitbart, Y., Fuhry, D. & Wang D., Department of Computer Science, Kent State University. Estimating the Number of Frequent Itemsets in a Large Database.

5 comments:

TejutejuJuly 4, 2018 at 6:18 AM
Thanks for sharing such details about big data and hadoop. Big data hadoop online Course
ReplyDelete
Replies
TejutejuJuly 4, 2018 at 6:19 AM
Nice post ! Thanks for sharing valuable information with us. Keep sharing.. Big data hadoop online Course Hyderabad
ReplyDelete
Replies
cloudbeginnersSeptember 4, 2021 at 10:39 PM
world777 india
ReplyDelete
Replies
aaryanOctober 8, 2021 at 5:17 AM
This comment has been removed by the author.
ReplyDelete
Replies
pragyanNovember 23, 2021 at 3:33 AM
This comment has been removed by the author.
ReplyDelete
Replies

Add comment