next up previous contents
Next: 6.2.4 Attribute Coupling Up: 6.2 Synthetic Datasets Previous: 6.2.2 Number of Attributes   Contents


6.2.3 Sparsity

Figure: AUC and time versus sparsity. Note that the vertical time axis in the right plot is logarithmic.
\includegraphics[width=\textwidth]{figures/perf_auc_sparse_all.ps}
tex2html_comment_mark>809 figures/perf_time_sparse_all.ps

The sparsity plots in Figure 6.4 are quite jagged and irregular. The datasets have no coupling and half of the rows are positive, just as in the number of rows tests of Section 6.2.1. We are unable to explain why LR was able to find a reasonable linear classification boundary for the sparsest datasets, while SVM LINEAR was not. The erratic behavior of KNN also suggests that there is some interesting geometric structure in the data. If all of the algorithms behaved this way, we would blame the data generation procedure. However, SVM RBF, BC, and the LR classifiers all performed reasonably and similarly for the amount of data available in these datasets.

The timings are also somewhat erratic, but the rank order of the classifiers is virtually identical to what we've seen already. The times for the faster algorithms, such as LR and SVM LINEAR, appear to increase linearly. The time required is not growing at the rate predicted by their worst case complexity. This suggests that the effects of sparsity, and hence the number of nonzero entries in the data, are less relevant to speed than the dimensions of the data.


next up previous contents
Next: 6.2.4 Attribute Coupling Up: 6.2 Synthetic Datasets Previous: 6.2.2 Number of Attributes   Contents
Copyright 2004 Paul Komarek, komarek@cmu.edu