While it is tempting to think of the CG solver in CG-MLE as similar to the CG solver in IRLS, the two are very different. In IRLS we use CG to solve a linear system. In that context CG is an iterative exact method with numerical error. When using CG to minimize the non-quadratic convex LR log-likelihood surface, the likelihood surface is approximated locally as a quadratic surface. The CG line search in IRLS is not actually a search; the step length is computed exactly and only one step is made. The CG line search in CG-MLE is a true search, and there are even competing criteria for choosing the direction of the line search, as explained in Section 2.2.
Unlike our IRLS implementation, the cgeps parameter for our CG-MLE implementation is not multiplied by the number of columns. As noted above, the CG-MLE cgeps parameter is the analogue of the IRLS lreps parameter. From an implementation view, it is very similar to cgdeveps since it is a non-residual termination test for CG. To ensure termination of CG iterations, we again have the cgmax parameter. These parameters are summarized in Table 5.24.
tex2html_comment_mark>737
figures/time_mle_cgeps_all.ps
|
Figure 5.14 shows the relation between AUC and cgeps on the left, and between time and cgeps on the right. Note that the time axis is logarithmic. As cgeps decreases these graphs have the flat tails seen with cgdeveps in Figure 5.9. The AUC graph suggests that adequate convergence is obtained when cgeps=0.001. Just as with cgdeveps the ds1.10pca dataset is the most troublesome when choosing score versus time. Unlike the cgdeveps case, the tails in the time graph are already nearly flat by this point. We may choose the safer setting cgeps=0.0005 with little penalty. Table 5.37 shows the numerical results for cgeps values around 0.0005.
| cgeps | ds1 | imdb | citeseer | ds2 | ds1.100pca | ds1.10pca | ||||||
| 0.001 | 0.946 | 144 | 0.983 | 470 | 0.946 | 97 | 0.723 | 3728 | 0.917 | 471 | 0.841 | 46 |
| 0.0005 | 0.946 | 152 | 0.983 | 469 | 0.946 | 98 | 0.724 | 3784 | 0.916 | 506 | 0.844 | 50 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.0001 | 0.946 | 158 | 0.983 | 471 | 0.946 | 99 | 0.723 | 3872 | 0.918 | 585 | 0.847 | 63 |
To prevent runaway CG-MLE iterations, we have the cgmax parameter. Table 5.38 shows the maximum number of CG iterations per fold of the experiments above using the default parameters. These maximums are not much different from the minimums for each fold. There is no evidence that any reasonable experiment will fail to terminate with these parameters, but for good measure we set cgmax to 100. The cgeps and cgmax parameters are shown in Figure 5.15 as part CG-MLE in our LR description tree.
|