To our followers and over 200,000 unique visitors, we are in the process of pushing new content to this blog related to social media visualization. Our next posting will be in the beginning of August 2015. In the meanwhile, if you have any questions, please email us at: fmegahed@auburn.edu.
Below are some of the projects that we have recently worked on:
1- Visualizing the Statistics Literature, see results and code here
2- Real-time visualizations of twitter data, click here
3- Visualization tools for Quality Engineering, click here
On behalf of the site editors,
Fadel Megahed
www.fadelmegahed.com
The purpose of this blog post is to introduce a
probabilistic classifier that is often implemented through computer software
called “Naive Bayes” which is essentially used for pattern recognition within
some data set. I will draw the majority of my understanding in order to write
this post from the this video.
He begins the video by explaining the structure of the
dataset necessary for the application of the Naïve Bayes classifier.From the video, the set up for the data
should follow this form.
D = ((x(1),
y1), … , (x(n),y(n)))D is an algebraic expression for the
data set.In an attempt to make that more clear, the variable x(1)represents a coordinate pair( x1(1),xd(1))where the superscript shows what point that
coordinate belongs to and the subscript indexes that coordinate. X(i) is a point in the space of Rd . Yi
belongs to some finite set. In the video, he states that y represents a
finite set that will be the integers from 1 to n.
He mentions in
the video that there are several assumptions made when taking on a Naïve Bayes
approach to classification. Those assumptions are listed as the following.
1)We assume we have a family of some set of
distributions parametrized by theta and these distributions will have the
following properties. Each of these is a joint distribution on x and y. So
herex is going to be in Rd and Y is a
class.
He mentions in the video that this second assumption
is very key to the Naïve Bayes classification approach and that what this
essential means is that the first expression of assumption 2 (PΘ(x,y) = PΘ(x|y) PΘ(y))
will
factor to equal the second expression. What I essential draw from this is that we
assume that PΘ(x|y)
factors out to be the probability of the first coordinate x given y up to the
last coordinate of x given y.
3)Assume that the points are independently,
identically distributed based on parameter Θ. He mentions that in this context, the coordinates x1 … xn are independent
given y if (X,Y) ~ PΘ.
He finally makes
things a bit clearer at 7:35 in the video. He mentions that the main assumption
is a conditional independence assumption.
At this point,
he explains the “goal” of Naïve Bayes. Essentially he says that when some new x
enters the data set, we want to “predict” its y. He mentions that the algorithm
initiates by attempting to estimate the parameter Θ for which it is believed the distribution of the (x,y)s
follow. Theta is estimated from the data and then we compute the prediction of
y that maximizes over all possible classes the probability of that class given
the new x.Because we assume thatPΘ(x|y) PΘ(y) factors out to be = PΘ(x1|y)
… PΘ(xd|y) PΘ(y), we
will attempt to maximize the prediction y across all x’s, and that value should
give the new prediction for the value of y given to the new x. For a better
understanding, please watch the video.
The purpose of this tutorial is to introduce how to create
basic decision trees in rapid minor. I will use a default dataset in rapid
minor, “Iris”, for the purposes of this tutorial.
1)In order to access this data set, click the
processes tab to make sure you are in the correct window, then go to the
repository and click on the repository where it says data and open the drop
down menu to see the data set “Iris” as shown in the picture below.
2) Click and drag the data set into the main
processes window. Once the object representing the data set is in the window,
clock the bump on the back of it that says out. A line should appear. Connect
that line to the bump at the corner of the window, then hit run at the top of
the screen so that we can go look into the results tab to get a view of the
structure of this data set.
3)Below, we can see the structure of the data we
intend to create the decision tree around. You will notice that there are four
attributes which are numerical data types and one attribute is a nominal label.
4)Click the tab necessary to go back to the main
processes window.In the Operators menu
Click open the following drop down menus in this order: Modeling, Tree
induction, Decision Tree. Drag the decision tree icon into the main processes
window and make the connections shown in the picture below. After you have the
main processes window set up as picture below, click run and rapid miner will
take you to the output automatically.
5)Below is the resulting output for this decision
tree from rapid minor using the default parameters for the rapid minor decision
tree. The trees root node (at the top of the tree) begins with the a3 node in order
to make decisions for classification. The results yield that for values in a3
which are less than or equal to 2.45, that can be shown to fully belong to the
group “iris-setosa”, as an example. As you go down the tree, you acquire more
and more criteria for some classification. For further instruction on the use of decision
trees in rapid minor, visit the rapid miner website.