Friday, May 3, 2013

Naive Bayes: a simple classifier

      The purpose of this blog post is to introduce a probabilistic classifier that is often implemented through computer software called “Naive Bayes” which is essentially used for pattern recognition within some data set. I will draw the majority of my understanding in order to write this post from the this video.

   He begins the video by explaining the structure of the dataset necessary for the application of the Naïve Bayes classifier.  From the video, the set up for the data should follow this form.
D = ((x(1), y1), … , (x(n),y(n)))        D is an algebraic expression for the data set. In an attempt to make that more clear, the variable x(1)  represents a coordinate pair    (  x1(1),xd(1))  where the superscript shows what point that coordinate belongs to and the subscript indexes that coordinate. X(i)  is a point in the space of Rd . Yi belongs to some finite set. In the video, he states that y represents a finite set that will be the integers from 1 to n.
He mentions in the video that there are several assumptions made when taking on a Naïve Bayes approach to classification. Those assumptions are listed as the following.
1)      We assume we have a family of some set of distributions parametrized by theta and these distributions will have the following properties. Each of these is a joint distribution on x and y. So here  x is going to be in Rd and Y is a class.
2)      PΘ(x,y) = PΘ(x|y) PΘ(y) = PΘ(x1|y) … PΘ(xd|y)  PΘ(y)
He mentions in the video that this second assumption is very key to the Naïve Bayes classification approach and that what this essential means is that the first expression of assumption 2 (PΘ(x,y) = PΘ(x|y) PΘ(y)) will factor to equal the second expression. What I essential draw from this is that we assume that PΘ(x|y) factors out to be the probability of the first coordinate x given y up to the last coordinate of x given y.
3)      Assume that the points are independently, identically distributed based on parameter Θ. He mentions that in this context, the coordinates  x1 … xn are independent given y if (X,Y) ~ PΘ.
He finally makes things a bit clearer at 7:35 in the video. He mentions that the main assumption is a conditional independence assumption.

At this point, he explains the “goal” of Naïve Bayes. Essentially he says that when some new x enters the data set, we want to “predict” its y. He mentions that the algorithm initiates by attempting to estimate the parameter Θ for which it is believed the distribution of the (x,y)s follow. Theta is estimated from the data and then we compute the prediction of y that maximizes over all possible classes the probability of that class given the new x.  Because we assume that  PΘ(x|y) PΘ(y) factors out to be = PΘ(x1|y) … PΘ(xd|y)  PΘ(y), we will attempt to maximize the prediction y across all x’s, and that value should give the new prediction for the value of y given to the new x. For a better understanding, please watch the video.


  1. I can see that you are are genuinely passionate about this! great information.
    thank you...!
    Big data training

  2. Thank you so much for sharing this worthwhile to spent time on. You are running a really awesome blog. Keep up this good work Big Data Course in Chennai

  3. Learning new technology would give oneself a true confidence in the current emerging Information Technology domain. With the knowledge of big data the most magnificent cloud computing technology one can go the peek of data processing. As there is a drastic improvement in this field everyone are showing much interest in pursuing this technology. Your content tells the same about evolving technology. Thanks for sharing this.

    Hadoop Training in Chennai | Big Data Training | Big Data Course in Chennai | Best Hadoop Training in Chennai

  4. Wonderful blog.. Thanks for sharing informative blog

    Training on CSTM/CSQP/CISQA

  5. I have finally found a Worth able content to read. The way you have presented information here is quite impressive. I have bookmarked this page for future use. Thanks for sharing content like this once again. Keep sharing content like this.

    Software testing training in chennai | Software testing course | Manual testing training in Chennai

  6. There is a huge demand for professional big data analysts who are able to use the software which is used to process the big data in order to get accurate results. MNC's are looking for professionals who can process their data so that they can get into a accurate business decision which would eventually help them to earn more profits, they can serve their customers better, and their risk is lowered.
    big data training in chennai|big data training|big data course in chennai|big data training chennai|big data hadoop training in chennai

  7. Your blog is really nice and informative. Thanks for sharing this post. Keep posting.
    datastage training in chennai

  8. • thanks for ur valuable information about informatica.
    oracle training in chennai

  9. • such a good website and given to more information thanks! and more visit.
    oracle training inchennai

  10. Great and useful article. Creating content regularly is very tough. Your points are motivated me to move on
    SEO Company in Chennai

  11. Finding the time and actual effort to create a superb article like this is great thing. I’ll learn many new stuff right here! Good luck for the next post buddy..
    Fresher Jobs
    Fresher Opening

  12. Thanks for sharing Valuable information about Bigdata. This post is really helped me a lot. Keep sharing........... If it possible share some more tutorials???????????


  13. This is excellent information. It is amazing and wonderful to visit your site.Thanks for sharing this information,this is useful to me...
    Android Training in Chennai
    Ios Training in Chennai

  14. i really like this blog.And i got more information's from this blog.thanks for sharing!!!!
    Digital Marketing Company in Chennai

  15. this is very nice post thanks for updating this information.

    Hadoop Training in Chennai

  16. Just read your website. Good one. I liked it. Keep going. you are a best writer your site is very useful and informative thanks for sharing!

    Herbal Shampoo
    Dandruff Treatment
    Antifungal Cream
    Vitiligo Medicines

  17. Just read your website. Good one. I liked it. Keep going. you are a best writer your site is very useful and informative thanks for sharing!
    Herbal Shampoo
    Dandruff Treatment

  18. I get a lot of great information from this blog. Thanks for sharing this valuable information to our vision. Big Data Hadoop Online Training Bangalore

  19. It is really very excellent,I find all articles was amazing.Awesome way to get exert tips from everyone,not only i like that post all peoples like that post.Because of all given information was wonderful and it's very helpful for me.
    SAP Training in Chennai
    SAP ABAP Training in Chennai
    SAP FICO Training in Chennai
    SAP MM Training in Chennai


  20. It was so good to read and useful to improve my knowledge as updated one.Thanks to Sharing.
    Informatica Training In Chennai | Hadoop Training In Chennai | Sap MM Training In Chennai

  21. Its really an Excellent post. I just stumbled upon your blog and wanted to say that I have really enjoyed reading your blog. Thanks for sharing....

    Carwash in omr
    usedcars in omr
    automotors in omr
    car accessories in omr
    secondhand car in omr