General liner Models (GLM) is a popular classification
algorithm. In the GLM, a random variable Y is assumed to follow a distribution
in the exponential family. In this model, the GLM generalizes two components
(stochastic and systematic components) and a link relationship function between
them.
(yi│xi )~N(x'iβ,σ^2 )
with E(yi│xi )=xi'β
and
Var(yi│xi )=σ^2 ,
where, xi'β (systematic component) is the linear combination
of the predictors x'i and β denoting vectors of
predictors and the coefficients. Also, σ^2 is a stochastic component.
Now let’s take a more detailed look at each of these components and the link
function.
Stochastic Component
Stochastic
Component identifies the response variable (Y=(y1,…,yn)) and assumes a
probability distribution for it. When Y is a continuous variable, it is usually
assumed that Y follows a Normal distribution. In fact, In GLM, we can use any
distribution in Exponential Family, because this is a comprehensive class
including the properties of the Normal distribution.
Systematic Component
Systematic
Component identifies the predictor variables (x'=(x1,…,xn)). The Systematic
Component consists of the linear combination of the variables called as linear
Predictor and some linear function of them.
α+x1 β1+⋯+xn βn
The expected value of the response variable E(Y)=μ is
modeled. We want to see how μ varies as a function of the levels of the
predictor variables, xi's.
Link Function
Link Funtion
identifies the relationship (link) between the expected value of the stochastic
component
(E(Y)=μ) and the systematic component (α+x1 β1+⋯+xn βn). The link function is denoted by g(μ). It is a monotone
function, that is, as the systematic part gets larger, μ gets larger (or
smaller). Sometimes, the relationship between the components may be non-linear.
So the general model for a GLM is
g(μ)=α+x1 β1+⋯+xn βn .
Some common links
are Identity Link (ordinary regression, ANOVA, ANCOVA) with natural parameter
μ, Log Link with natural parameter log(μ). Log Link is usually used when Y is
nonnegative. And finally, Logit link with Natural Parameter log(μ/(1-μ)), and
in addition this link is usually used when 0≤μ≤1.
1- Park, M.Y., (Department of Statistics) &
Hastie, T., (Department of Statistics and Department of Health Research &
Policy). (2006). L1 Regularization Path Algorithm for
Generalized Linear Models. Stanford University,
November 12, 2006
2- Gill, J. (2000). Generalized Linear Models: A Unified Approach. (Sage University
Paper Series on Quantitative Applications in the Social Sciences. Series No:
07-134). Thousand Oaks, CA: Sage.
No comments:
Post a Comment