next up previous contents
Next: 6. Characterizing Logistic Regression Up: 5.3 CG-MLE Parameter Evaluation Previous: 5.3.4 BFGS-MLE   Contents


5.3.5 Direct (CG-MLE) Summary

We have found that CG-MLE requires many of the same stability parameters, with nearly identical default values, as IRLS did. Notably different was the ease with which the termination epsilon cgeps was found in Section 5.3.2. Perhaps the most interesting results are in Section 5.3.3, which examined the effects of different CG direction update formulas and the BFGS-MLE.

Our final CG-MLE method is summarized in Algorithm 8. Starting at line 8.1 we set our nonlinear CG parameters as we have described above. Line 8.2 shows how the binitmean parameter changes the value of $ \ensuremath{\hat{\ensuremath{\mathbf{\beta}}}}_0$. Several lines related to cgwindow are present. As with our IRLS implementation, shown in Algorithm 5, we return the value of $ \hat{\ensuremath{\mathbf{\beta}}}$ which minimized the deviance. This may be seen in line 8.3. As in Algorithm 5, we have embedded the parameters cgmax, cgeps, and cgwindow to emphasize that we have fixed their values.

It is worth emphasizing the difference between CG-MLE and IRLS with cgdeveps. Both use the relative difference as a termination criterion. However, IRLS is a different nonlinear optimization method than nonlinear CG. The first IRLS iteration starts from the same place as the first CG-MLE iteration. In IRLS, linear CG is applied to a linear weighted least squares problem. In CG-MLE, nonlinear CG is applied to the score equations. Termination of linear CG for the first IRLS iteration is very unlikely to occur at the same place as termination occurs for nonlinear CG applied to the LR score equations, though linear CG should terminate with far fewer computations. At this point there are more IRLS iterations to run, but CG-MLE is finished. While both algorithms should ultimately arrive at similar parameter estimates, there is no reason to believe they will take the same path to get there, or require the same amount of computation. That both algorithms apply the same termination criteria to their version of CG is at best a superficial similarity.

.0
\begin{algorithm}
% latex2html id marker 5528
[tbp] \SetKwInOut{Input}{input} \S...
...}}}}:= $\ensuremath{\hat{\ensuremath{\mathbf{\beta}}}}_{i^*}$ \;\end{algorithm}

We do not have many new ideas to propose for optimizing the LR MLE using numerical methods. Regularization was already suggested by Zhang and Oles [49]. Minka [27] made a brief comparison of several numerical methods, including a quasi-Newton method algorithm called Böhning's method, in a short technical report. Minka mentioned the need for regularization, and in two of his three datasets found that CG outperformed other algorithms. Zhang et al. [48] preferred Hestenes-Stiefel direction updates when using CG for nonlinear convex optimization, which is somewhat at odds with the conclusions of this chapter. We do not have an explanation for this contradiction. In fact, the most promising line of exploration opened by this section is the possibility that Fletcher-Reeves updates, usually ignored for nonlinear CG, might work well in our environment.


next up previous contents
Next: 6. Characterizing Logistic Regression Up: 5.3 CG-MLE Parameter Evaluation Previous: 5.3.4 BFGS-MLE   Contents
Copyright 2004 Paul Komarek, komarek@cmu.edu