Tuesday, March 26, 2013

General liner Models (GLM)


General liner Models (GLM) is a popular classification algorithm. In the GLM, a random variable Y is assumed to follow a distribution in the exponential family. In this model, the GLM generalizes two components (stochastic and systematic components) and a link relationship function between them.
                                    (yi│xi )~N(x'iβ,σ^2 ) with E(yi│xi )=xi'β                              
and                                                     Var(yi│xi )=σ^2  ,                                      
                
where, xi'β (systematic component) is the linear combination of the predictors x'i and β denoting vectors of  predictors and the coefficients. Also, σ^2 is a stochastic component. Now let’s take a more detailed look at each of these components and the link function.

Stochastic Component
Stochastic Component identifies the response variable (Y=(y1,…,yn)) and assumes a probability distribution for it. When Y is a continuous variable, it is usually assumed that Y follows a Normal distribution. In fact, In GLM, we can use any distribution in Exponential Family, because this is a comprehensive class including the properties of the Normal distribution.

Systematic Component
Systematic Component identifies the predictor variables (x'=(x1,…,xn)). The Systematic Component consists of the linear combination of the variables called as linear Predictor and some linear function of them.
                                            α+x1 β1++xn βn                                                   
The expected value of the response variable E(Y)=μ is modeled. We want to see how μ varies as a function of the levels of the predictor variables, xi's.

Link Function
Link Funtion identifies the relationship (link) between the expected value of the stochastic component
(E(Y)=μ) and the systematic component (α+x1 β1+⋯+xn βn). The link function is denoted by g(μ). It is a  monotone function, that is, as the systematic part gets larger, μ gets larger (or smaller). Sometimes, the relationship between the components may be non-linear. So the general model for a GLM is
                                           g(μ)=α+x1 β1++xn βn  .                                            
     Some common links are Identity Link (ordinary regression, ANOVA, ANCOVA) with natural parameter μ, Log Link with natural parameter log(μ). Log Link is usually used when Y is nonnegative. And finally, Logit link with Natural Parameter log(μ/(1-μ)), and in addition this link is usually used when 0≤μ≤1.




1-    Park, M.Y., (Department of Statistics) & Hastie, T., (Department of Statistics and Department of Health Research & Policy). (2006). L1 Regularization Path Algorithm for Generalized Linear Models. Stanford University,  November 12, 2006
2-    Gill, J. (2000). Generalized Linear Models: A Unified Approach. (Sage University Paper Series on Quantitative Applications in the Social Sciences. Series No: 07-134). Thousand Oaks, CA: Sage.

No comments:

Post a Comment