next up previous contents
Next: 5.3.3 Direct (CG-MLE) Speed Up: 5.3 CG-MLE Parameter Evaluation Previous: 5.3.1.14 Stability Tests: Default   Contents


5.3.2 Direct (CG-MLE) Termination And Optimality

While it is tempting to think of the CG solver in CG-MLE as similar to the CG solver in IRLS, the two are very different. In IRLS we use CG to solve a linear system. In that context CG is an iterative exact method with numerical error. When using CG to minimize the non-quadratic convex LR log-likelihood surface, the likelihood surface is approximated locally as a quadratic surface. The CG line search in IRLS is not actually a search; the step length is computed exactly and only one step is made. The CG line search in CG-MLE is a true search, and there are even competing criteria for choosing the direction of the line search, as explained in Section 2.2.

Unlike our IRLS implementation, the cgeps parameter for our CG-MLE implementation is not multiplied by the number of columns. As noted above, the CG-MLE cgeps parameter is the analogue of the IRLS lreps parameter. From an implementation view, it is very similar to cgdeveps since it is a non-residual termination test for CG. To ensure termination of CG iterations, we again have the cgmax parameter. These parameters are summarized in Table 5.24.

Figure: AUC and time versus cgeps on several datasets with lreps=0.05 for CG-MLE. Note that the time axis is logarithmic.
\includegraphics[width=\textwidth]{figures/auc_mle_cgeps_all.ps}
tex2html_comment_mark>737 figures/time_mle_cgeps_all.ps

Figure 5.14 shows the relation between AUC and cgeps on the left, and between time and cgeps on the right. Note that the time axis is logarithmic. As cgeps decreases these graphs have the flat tails seen with cgdeveps in Figure 5.9. The AUC graph suggests that adequate convergence is obtained when cgeps=0.001. Just as with cgdeveps the ds1.10pca dataset is the most troublesome when choosing score versus time. Unlike the cgdeveps case, the tails in the time graph are already nearly flat by this point. We may choose the safer setting cgeps=0.0005 with little penalty. Table 5.37 shows the numerical results for cgeps values around 0.0005.


Table: AUC and Times For Several CG-MLE cgeps Values
cgeps ds1 imdb citeseer ds2 ds1.100pca ds1.10pca
0.001 0.946 144 0.983 470 0.946 97 0.723 3728 0.917 471 0.841 46
0.0005 0.946 152 0.983 469 0.946 98 0.724 3784 0.916 506 0.844 50
0.0001 0.946 158 0.983 471 0.946 99 0.723 3872 0.918 585 0.847 63

To prevent runaway CG-MLE iterations, we have the cgmax parameter. Table 5.38 shows the maximum number of CG iterations per fold of the experiments above using the default parameters. These maximums are not much different from the minimums for each fold. There is no evidence that any reasonable experiment will fail to terminate with these parameters, but for good measure we set cgmax to 100. The cgeps and cgmax parameters are shown in Figure 5.15 as part CG-MLE in our LR description tree.


Table 5.38: Maximum CG iteration counts for all folds of CG-MLE experiments.
ds1 imdb citeseer ds2 ds1.100pca ds1.10pca
41 5 4 42 34 34

Figure 5.15: LR tree showing single termination parameter used with CG-MLE, along with the safeguard parameter cgmax.
\includegraphics{figures/treemap-MLE-term.eps}


next up previous contents
Next: 5.3.3 Direct (CG-MLE) Speed Up: 5.3 CG-MLE Parameter Evaluation Previous: 5.3.1.14 Stability Tests: Default   Contents
Copyright 2004 Paul Komarek, komarek@cmu.edu