Saturday, March 30, 2013

Tutorial: Naive Bayes Classification Algorithm


Naive Bayes is a simple classification algorithm. Naive Bayes algorithm is based on the calculation effects of each criterion to results as probability. All of the potential datasets and newly added documents are used to calculate the possibility of each newly added term to affect the categories (Adsiz, 2006). By Bayes Theorem (Han & Kamber, 2001),

                                             P(cj│d)=(P(cj)P(d│cj ))/(P(d)) ,  
                                                
where P(cj) is the prior probability of category cj and d=(t1,…,tM ) is set of documents that is going to be classified.

Because of there are many features in datasets, it will be so difficult to calculate P(d│cj ). Therefore, in this algorithm, features in a document are considered as independent (Adsiz, 2006).

                                              P(d│cj )=∏P(ti│cj ) ,     i=1,…,M  ,                     
         
where ti is the ith  term in the document.

                                              P(cj│d)  is the probability of  a document being in category cj
                                              P(cj│d)=P(cj )∏P(ti│cj ) ,      i=1,…,M  ,       
                  
where P(ti│cj )  is the conditional probability of term ti  occurring in category cj. Also, it is an indicator of how much ti contributes to the category. In this method, we try to find the most appropriate category for the document. For this purpose, the best approach is the maximum a posteriori (map) category cj_map (Manning, Raghavan & Schütze, 2008).
                                            cj_map=[argmax_cj ]  P(cj│d)=[argmax_cj )]  P(cj│d)=P(cj )∏P(ti│cj )  
We do not know the exact values of parameters P(cj ) and P(ti│cj ). However, by using the maximum likelihood estimate (MLE) theorem, we can make estimations about these parameters. Let P ̃(cj ) be the approximate calculation of  P(cj), and P ̃(ti│cj ) be the approximate calculation of P(ti│cj ). 

                                                                   P ̃(cj )=Nj/N  ,      
                                                    
where N is the number of all documents, and N_j is the number of documents in category cj. And,

                                                    P ̃(ti│cj )=(1+Nij)/(M+∑ Nij),     i=1,…,MN   ,           
                            
where Nij is the number of documents belonging to category cj and including the ith   term ti.







1-      Adsiz, A., (Ahmet Yesevi University ). (2006). Dissertation: Text Mining.
2-      Han,  J. & Kamber, M. (2001). Data Mining. Morgan Kaufmann Publishers, San Francisco, CA

            3-      Manning, C.D. , Raghavan, P. & Schütze, H. (2008). Introduction to Information Retrieval, Cambridge University Press. 






No comments:

Post a Comment