Thursday, March 21, 2013

Support Vector Machines (SVM)

Support Vector Machines (SVM) was first introduced by Vladimir Vapnik (Vapnik, 1979) and the first paper published was "Vapnik, 1995)", (Boswell, 2002). Support Vector Machine (SVM) is a popular “binary” (Boswell, 2002) classification algorithm used to analyze data and recognize patterns (Press, 2007). Because of its high accuracy, ability to deal with high-dimensional data, the SVM is a popular classification method (Ben-Hur & Weston). For a given data set of training, each of the points is classified into one of the two classes. The SVM algorithm provides a model to choose the appropriate class/category for each point (Press,2007). In this method the first step is to identify a hyperplane which separates a dataset into its two classes/categories. If the data is not linearly separable, then the kernel notion (feature space) is used to transfer the data a higher dimensional space where the data are separable (Boswell, 2002).
For given k training examples {xi,yi },i=1,…,k , where each example has d inputs (xi∈R^d). (yi∈{-1,1}) are the two classes mentioned above. We define these classes as -1 and 1. Here, (w,b) is a hyperplane, vector (w), and a constant (b), expressed in the following equation                                 
W*x + b = 0
In addition, here w vector is orthogonal to the hyperplane which separates the training data into categories. The separation is identified by the following function                                                  
f(x) = sign(w*x + b)
However, a given hyperplane (w,b) can be expressed by all pairs {λw,λb}  for  λ∈R^+ (positive real numbers) . So, we define canonical hyperplane which separates the data from the hyperplane by a distance of at least 1 as following.
                      xi*w + b ≥ +1 when yi  = +1,                                      
                    xi*w + b ≤ -1 when yi  = -1,                                       
or more compactly:
                     yi*(xi*w + b)≥ 1 for all i .                                            
For the geometric distance of the hyperplane to a data point is calculated with  following formula, but first  we shold normalize by vector w.
                                    d((w,b),xi )=(yi*(xi*w + b) )/‖w‖ ≥ 1/‖w‖  .                                  
Our purpose is a hyperplane that maximizes the geometric distance to the closest data points. 
                                      
The green hyperplane maximizes the geometric distance to the closest data points. (Boswell, 2002)

                                   
The green hyperplane doesn't separate the two classes. The blue one does, with a small margin and The red hyperplane with the maximum margin. (Wikipedia, SVM, 2011).
                                  
Maximum-margin hyperplane and margins for an SVM trained with samples from two classes. Samples on the margin are called the support vectors. (Wikipedia, SVM, 2011).




1-    Boswell, D. (2002). Introduction to Support Vector Machines,Agst-6, 2002, http://dustwell.com/PastWork/IntroToSVM.pdf
2-      Press, WH; Teukolsky, SA; Vetterling, WT; Flannery, BP (2007). "Section 16.5. Support Vector Machines".
3-      Ben-Hur, A., (Department of Computer Science Colorado State University) & Weston, J., (NEC Labs America Princeton, NJ 08540 USA), A User's Guide to Support Vector Machines.

1 comment: