Next: B.4.4 lr_cgmle
Up: B.4 Learners
Previous: B.4.2 dtree
Contents
B.4.3 lr
This is an implementation of logistic regression (LR).
LR computes
for which the model values
best approximate
the dataset outputs
under the model
where
is one dataset row.
Please see
[13,25,20,10]
for details about logistic regression. This implementation uses
iterative re-weighted least squares (IRLS)
[20,10]
to maximize the LR log-likelihood
where
is the number of rows in the dataset. For logistic
regression, IRLS is equivalent to Newton-Raphson [25].
To improve the speed of IRLS, this implementation uses
conjugate gradient
[26,27,31,7]
as an approximate linear solver
[20]. This solver is applied
to the linear regression
 |
(B.7) |
where
diag
and
.
The current estimate of
is scored using the likelihood ratio
,
where
is the likelihood of of a saturated model with
parameters and
is the likelihood of the current model.
This ratio is called the ``deviance'', and the IRLS
iterations are terminated when the relative difference of the deviance
between iterations is sufficiently small. Other termination
measures can be added, such as a maximum number of iterations.
| Keyword |
Arg Type |
Arg Vals |
Default |
| Common |
| cgdeveps |
float |
[1e-10, ) |
0.0 |
| cgeps |
float |
[1e-10, ) |
0.001 |
| lrmax |
int |
0,...,  |
30 |
| rrlambda |
float |
[0, ) |
10.0 |
| Rare |
| binitmean |
none |
|
|
| cgbinit |
none |
|
|
| cgwindow |
int |
0, ...,  |
3 |
| cgdecay |
float |
[1.0, ) |
1000 |
| cgmax |
int |
0, ...,  |
200 |
| holdout_size |
float |
[0.0, 1.0] |
0.0 |
| lrdevdone |
float |
[0.0, ] |
0.0 |
| lreps |
float |
[1e-10, ) |
0.05 |
| margin |
float |
[0.0, ) |
0 |
| modelmax |
float |
(modelmin, 1.0] |
1.0 |
| modelmin |
float |
[0.0, modelmax) |
0.0 |
| wmargin |
float |
[0.0, 0.5) |
0.0 |
|
|
|
|
Common keywords and arguments:
Rare keywords and arguments:
- binitmean: If this keyword is used, the model offset
parameter
is initialized to the mean of the output values
. [26] reports that this eliminated some
numerical problems in his implementation. We have not observed
significant changes in our implementation when using this technique.
- cgbinit: If cgbinit is specified,
each IRLS iteration will start the conjugate gradient solver at
the current value of
. By default the conjugate gradient
solver is started at the zero vector.
- cgwindow int: If the previous cgwindow
conjugate gradient iterations have not produced a
with
smaller deviance than the best deviance previously found, then
conjugate gradient iterations are terminated. As usual, the
corresponding to the best observed deviance is returned to the
IRLS iteration.
- cgdecay float: If the deviance ever exceeds
cgdecay times the best deviance previously
found, conjugate gradient iterations are terminated.
As usual, the
corresponding to the best observed deviance is returned to the
IRLS iteration.
- cgmax int: This is the
upper bound on the number of conjugate gradient iterations allowed.
The final value of
is returned.
- holdout_size float: If this parameter is
positive IRLS iterations are terminated based on predictions
made on a holdout set, rather then according to the relative
difference of the deviance. This parameter specified the percentage,
between 0.0 and 1.0, of the training data to be held out. In
a k-fold cross-validation experiment, the training data is not the whole dataset but
the data available for training during each fold. The deviance
on the holdout set replaces the deviance on the
prediction data.
- lrdevdone float: IRLS iterations will
be terminated if the deviance becomes smaller than
lrdevdone.
- lreps float: IRLS iterations are terminated
If the relative difference
of the deviance between IRLS iterations is less than
lreps.
- margin float: The dataset outputs, which
must be zero or one (c.f. Section
), are ``shrunk''
by this amount. That is, 1 is changed to 1 - margin
and 0 is changed to margin. It is recommended that
this parameter be left at its default value.
- modelmax float: This sets an upper bound
on
computed by the logistic regression. Although the
computation of
should assure it is positive, round-off
error can create problems. It is recommended that
this parameter be left at its default value.
- modelmin float: This sets a lower bound
on
computed by the logistic regression. Although the
computation of
should assure it is strictly less than one,
round-off error can create problems. It is recommended that
this parameter be left at its default value.
- wmargin float: wmargin is short for
``weight margin''. If nonzero, weights less than
wmargin are are changed to wmargin and
weights greater than 1 - wmargin are changed to
1 - margin. This is option is very helpful for
controlling numerical problems, especially when rrlambda
is zero. Values between 0.001 and 1e-15 are reasonable.
Next: B.4.4 lr_cgmle
Up: B.4 Learners
Previous: B.4.2 dtree
Contents
Copyright 2004 Paul Komarek, komarek@cmu.edu