Naive Bayes is a simple classification
algorithm. Naive Bayes algorithm is based on the calculation effects of each
criterion to results as probability. All of the potential datasets and newly
added documents are used to calculate the possibility of each newly added term
to affect the categories (Adsiz, 2006). By Bayes Theorem (Han & Kamber,
2001),
P(cj│d)=(P(cj)P(d│cj
))/(P(d)) ,
where P(cj)
is the prior probability of category cj and d=(t1,…,tM ) is set of documents
that is going to be classified.
Because of there are many features in
datasets, it will be so difficult to calculate P(d│cj ). Therefore, in this
algorithm, features in a document are considered as independent (Adsiz, 2006).
P(d│cj )=∏P(ti│cj
) , i=1,…,M ,
where ti is
the ith term in the document.
P(cj│d) is the probability of a document being in category cj
P(cj│d)=P(cj
)∏P(ti│cj ) , i=1,…,M ,
where P(ti│cj
) is the conditional probability of term
ti occurring in category cj. Also, it is
an indicator of how much ti contributes to the category. In this method, we try
to find the most appropriate category for the document. For this purpose, the
best approach is the maximum a posteriori (map) category cj_map (Manning,
Raghavan & Schütze, 2008).
cj_map=[argmax_cj
] P(cj│d)=[argmax_cj )] P(cj│d)=P(cj )∏P(ti│cj )
We do not
know the exact values of parameters P(cj ) and P(ti│cj ). However, by using the
maximum likelihood estimate (MLE) theorem, we can make estimations about these
parameters. Let P ̃(cj ) be the approximate calculation of P(cj), and P ̃(ti│cj ) be the approximate
calculation of P(ti│cj ).
P ̃(cj )=Nj/N ,
where N is
the number of all documents, and N_j is the number of documents in category cj.
And,
P ̃(ti│cj )=(1+Nij)/(M+∑ Nij), i=1,…,MN ,
where Nij is
the number of documents belonging to category cj and including the ith
term ti.
1- Adsiz, A., (Ahmet Yesevi University ). (2006). Dissertation: Text Mining.
2- Han, J. & Kamber, M. (2001). Data Mining. Morgan Kaufmann
Publishers, San Francisco, CA
3- Manning, C.D. , Raghavan,
P. & Schütze, H. (2008). Introduction to Information Retrieval,
Cambridge University Press.
No comments:
Post a Comment