Next: A. Acknowledgements
Up: 4 Conclusion
Previous: 9. Contributions
Contents
10. Future Work
In this thesis, we have shown promising results for LR as a data
mining tool and high-dimensional classifier. There is, of course,
more research to be done. We list below several potential research
topics which we believe may improve performance or broaden LR's
appeal. This list includes all potential research topics mentioned
previously in this thesis.
- We stated in Chapter 2 that CG performance
depends on the condition number of the Hessian
[41]. A general technique called
preconditioning can be used to improve (reduce) the
condition number of a matrix. This technique is popular in the CG
literature [41,26,7],
possibly because the preconditioning can be done using only
matrix-vector operations. Preconditioning should be explored for
our implementations.
- We discussed several CG, IRLS and CG-MLE termination
techniques. One we did not discuss, but which is appropriate when
data is plentiful, is the use of validation sets
[10]. In this approach to termination, part of
the training data is held out to be used for approximation of
prediction performance. When this estimate of the prediction
performance is deemed adequate, training is terminated.
- Though CG is designed for positive definite matrices, it has
performed well on a variety of ill-conditioned data matrices
arising from datasets such as ds1 with linearly dependent or
duplicate columns. A version of CG known as biconjugate
gradient is designed for positive semidefinite matrices, and this
might further accelerate IRLS [7]. Other
iterative methods for linear equations, such as MINRES,
might also be explored.
- Truncated-Newton methods should be investigated for finding
the zeros of the LR score equations, and the result compared to
our combination of IRLS and CG.
- We concluded in the summary for CG-MLE computations,
Section 5.3.5, that the Fletcher-Reeves direction update
may be competitive with the modified Polak-Ribiére direction update when our
other regularization and failsafe checks are used. This should be
an easy area to explore. Because the modified Polak-Ribiére direction update
includes a Powell restart, one might wish to implement separate
Powell restarts or an alternative when using Fletcher-Reeves.
- The CG residual is a cheap way to terminate CG iterations in
IRLS. Using the deviance requires significantly more computation
but keeps our focus on the LR likelihood. An easily-computed
approximation to the deviance, or other related statistic, could
improve performance for IRLS with cgdeveps as well as CG-MLE.
One possible replacement for the deviance is the Pearson
statistic
[30].
- Regularization is an important part of LR performance. We
have suggested in this thesis that a single Ridge Regression
parameter
can be used across a wide variety of datasets.
Furthermore we apply the same value of
to all slope
coefficients in the LR model. In localized Ridge Regression the
penalization parameter becomes a penalization vector such that the
penalty function is
. Other penalty functions may also prove useful.
- We terminate CG iterations when more than cgwindow iterations
fail to improve the residual (cgeps) or deviance (cgdeveps).
One alternative is to halve the step size when iterations fail to
make an improvement [10]. This technique is
sometimes called fractional increments
[30].
- We have ignored feature selection because of our focus on
autonomous high-dimensional classification. However, model
selection is very important when one intends to make a careful
interpretation of the LR model coefficients. Iterative model
selection is the process of searching through the space of
possible models to find the ``best'' one. Because there are
possible models for a dataset with
attributes, it is not
generally possible to estimate
for all of them. Instead
the search is directed by local improvements in model quality.
One common iterative model selection method is stepwise
logistic regression [13]. Such techniques need
to fit many LR models, and may benefit from appropriate
application of our fast implementations.
- We have described IRLS in the context of LR, with the result
that we have equated IRLS and Newton-Raphson. For generalized
linear models this is not always true [25].
Our modifications to IRLS apply in the general case, and can be
used with other generalized linear models. This has not been
explored yet.
- Text classification is an interesting and active research
area. Many classifiers are used, and SVMs are often considered
the state-of-the-art. LR has a troubled history in text
classification, as well as a promising future
[48,49,40]. Even when
classifiers capable of handling high-dimensional inputs are used,
many authors apply feature selection to reduce the number of
attributes. Joachims [15] has argued that feature
selection may not be necessary, and may hurt performance. We
believe our implementation of LR is suitable for text
classification, and could be competitive with the state-of-the-art
techniques.
Next: A. Acknowledgements
Up: 4 Conclusion
Previous: 9. Contributions
Contents
Copyright 2004 Paul Komarek, komarek@cmu.edu