Next: 4.4 Logistic Regression for
Up: 4. Logistic Regression
Previous: 4.2 Maximum Likelihood Estimation
Contents
4.3 Iteratively Re-weighted Least Squares
An alternative to numerically maximizing the LR maximum likelihood
equations is the iteratively re-weighted least squares (IRLS)
technique. This technique uses the Newton-Raphson algorithm to solve
the LR score equations. This process is explained in more detail
below.
To compute the LR score equations we first differentiate the
log-likelihood function with respect to the parameters
where
and
is the number of parameters. We then set
each of these partial derivatives equal to zero. If we extend the
definition of
such that
, we may write the score equations as
 |
(4.12) |
Define
as the vector of partial derivatives shown
in Equation 4.11. Finding a solution to the LR score
equations is equivalent to finding the zeros of
. Because the
LR likelihood is convex [27], there will be only one
minimum and hence exactly one zero for
. The Newton-Raphson
algorithm finds this optimum through repeated linear approximations.
After an initial guess
is chosen, each Newton-Raphson
iteration updates its guess for
via the first-order
approximation
J |
(4.13) |
where
J
is the Jacobian of
evaluated as
. The Jacobian is a matrix, and each row is
simply the transposed gradient of one component of
evaluated
at
. Hence the
-element of
J
is
. If we define
and
diag
, then the Jacobian may be written in matrix notation as
. Using this fact and
Equation 4.12 we may rewrite the Newton-Raphson update
formula as
 |
(4.14) |
Since
we may rewrite Equation 4.14 as
where
[25,30,10]. The
elements of vector
are often called the adjusted
dependent covariates, since we may view Equation 4.16
as the weighted least squares problem from Section 3.1
with dependent variables, or covariates,
.
Newton-Raphson iterations may be stopped according to any of several
criteria such as convergence of
or the LR log-likelihood
from Equation 4.8.
Solving Equation 4.16 required a matrix inversion or
Cholesky back-substitution, as well as several matrix vector
multiplications. As a result, IRLS as described has time complexity
O
. We have commented several times in this thesis that
O
is unacceptably slow for our purposes. A simple
alternative to the computations of Equation 4.16 will be
presented in Chapter 5.
Next: 4.4 Logistic Regression for
Up: 4. Logistic Regression
Previous: 4.2 Maximum Likelihood Estimation
Contents
Copyright 2004 Paul Komarek, komarek@cmu.edu