Next: 3.2 Ridge Regression
Up: 3. Linear Regression
Previous: 3. Linear Regression
Contents
3.1 Normal Model
Regression is a collection of statistical function-fitting techniques.
These techniques are classified according to the form of the function
being fit to the data. In linear regression a linear function is used
to describe the relation between the independent variable or vector
and a dependent variable
. This function has the form
 |
(3.1) |
where
is the vector of unknown parameters to be estimated
from the data. If we assume that
has a zeroth element
, we include the constant term
in the parameter vector
and write this function more conveniently as
 |
(3.2) |
Because
is linear, the parameters
are
sometimes called the slope parameters while
is the offset at
the origin. Due to random processes such as measurement errors, we
assume that
does not correspond perfectly to
. For this reason a statistical model is created which
accounts for randomness. For linear regression we will use the linear
model
 |
(3.3) |
where the error-term
is a Normally-distributed random
variable with zero mean and unknown variance
. Therefore
the expected value of this linear model is
and for this reason
is called the expectation function. We
assume that
is constant and hence does not depend on
. We also assume that if multiple experiments
are
conducted, then the errors
are independent.
Suppose
is an
matrix whose rows
represent
experiments, each described by
variables. Let
be an
vector representing the outcome of each
experiment in
. We wish to estimate values for the parameters
such that the linear model of
Equation 3.3 is, hopefully, a useful summarization
of the data
. One common method of parameter
estimation for linear regression is least squares. This method
finds a vector
which minimizes the residual sum of
squares (RSS), defined as
where
is the
row of
. Note that
is the ``true'' parameter vector, while
is an
informed guess for
. Throughout this thesis a variable with
a circumflex, such as
, is an estimate of some quantity we
cannot know like
. The parameter vector is sometimes called
a weight vector since the elements of
are
multiplicative weights for the columns of
. For linear
regression the loss function is the RSS. Regression techniques
with different error structure may have other loss functions.
To minimize the RSS we compute the partial derivative at
with respect to
, for
, and set the
partials equal to zero. The result is the score equations
for linear regression, from
which we compute the least squares estimator
 |
(3.8) |
The expected value of
is
, and hence
is
unbiased. The covariance matrix of
,
cov
,
is
. The variances along the
diagonal are the smallest possible variances for any unbiased estimate
of
. These properties follow from our assumption that the
errors
for predictions were independent and Normally
distributed with zero mean and constant variance
[30].
Another method of estimating
is maximum likelihood
estimation. In this method we evaluate the probability of
encountering the outcomes
for our data
under the
linear model of Equation 3.3 when
. We will choose as our estimate of
the value
which maximizes the likelihood function
over
. We are interested in maximization of
and
not its actual value, which allows us to work with the more convenient
log-transformation of the likelihood. Since we are maximizing over
we can drop factors and terms which are constant with
respect to
. Discarding constant factors and terms that will
not affect maximization, the log-likelihood function is
To maximize the log-likelihood function we need to minimize
. This is the same
quantity minimized in Equation 3.7, and hence it has
the same solution. In general one would differentiate the
log-likelihood and set the result equal to zero, and the result is
again the linear regression score equations. In fact these equations
are typically defined as the derivative of the log-likelihood
function. We have shown that the maximum likelihood estimate (MLE)
for
is identical to the least squares estimate under our
assumptions that the errors
are independent and Normally
distributed with zero mean and constant variance
.
If the variance
for each outcome
is different, but
known and independent of the other experiments, a simple variation
known as weighted least squares can be used. In this procedure
a weight matrix
diag
is used to standardize the unequal variances. The score
equations become
, and the
weighted least squares estimator is
 |
(3.13) |
It is also possible to accommodate correlated errors when the
covariance matrix is known. [30]
Next: 3.2 Ridge Regression
Up: 3. Linear Regression
Previous: 3. Linear Regression
Contents
Copyright 2004 Paul Komarek, komarek@cmu.edu