next up previous contents
Next: 6.2.3 Sparsity Up: 6.2 Synthetic Datasets Previous: 6.2.1 Number of Rows   Contents


6.2.2 Number of Attributes

Figure: AUC and time versus number of attributes. Note that the vertical time axis in the right plot is logarithmic.
\includegraphics[width=\textwidth]{figures/perf_auc_numatts_all.ps}
tex2html_comment_mark>805 figures/perf_time_numatts_all.ps

The AUC and time plots in Figure 6.3 compare classifier performance as the number of attributes increases from one thousand to one-hundred thousand. Again we see KNN scoring poorly and running slowly. This is mostly due to poor ball tree performance on this dataset. SVM RBF is similarly slow but achieves scores on par with the remaining classifiers. It is now easier to distinguish the LR classifiers from SVM LINEAR.

Perhaps the only important feature of these graphs is that the relative scores and speeds are consistent with the previous section. The nonlinearity of these graphs is likely due to the slight amount of coupling between the dataset attributes. This suggests that the effects of attribute correlation on performance increases with the number of attributes. This follows naturally from the hypothesis that extra iterations are spent overcoming cancelation between attributes. In Section 6.2.4, we'll see that changing the level of correlation has less impact on speed than changing the number of correlated attributes.


next up previous contents
Next: 6.2.3 Sparsity Up: 6.2 Synthetic Datasets Previous: 6.2.1 Number of Rows   Contents
Copyright 2004 Paul Komarek, komarek@cmu.edu