next up previous contents
Next: 5.1.5 Computing Platform Up: 5.1 Preliminaries Previous: 5.1.3.3 Synthetic Datasets   Contents


5.1.4 Scoring

All experiments in this thesis are ten-fold cross-validations. The predictive performance of the experiments is measured using the Area Under Curve (AUC ) metric, which is described below. All times are reported in seconds, and details about timing measurements may be found in Section 5.1.5.

Before we describe AUC scores, we must first describe Receiver Operating Characteristic (ROC ) curves [5]. We will use the ROC and AUC description we published in [20]. To construct an ROC curve, the dataset rows are sorted according to the probability a row is in the positive class under the learned logistic model. Starting at the graph origin, we examine the most probable row. If that row is positive, we move up. If it is negative, we move right. In either case we move one unit. This is repeated for the remaining rows, in decreasing order of probability. Every point $ (x,y)$ on an ROC curve represents the learner's ``favorite'' $ x+y$ rows from the dataset. Out of these favorite rows, $ x$ are actually positive, and $ y$ are negative.

Figure: Example ROC curve.
\includegraphics[width=4in]{figures/auc_explanation.eps}

Figure 5.2 shows an example ROC curve. Six predictions are made, taking values between 0.89 down to 0.17, and are listed in the first column of the table in the lower-left of the graph. The actual outcomes are listed in the second column. The row with highest prediction, 0.89, belongs to the positive class. Therefore we move up from the origin, as written in the third column and shown by the dotted line in the graph moving from (0,0) to (1,0). The second favorite row was positive, and the dotted line moves up again to (2,0). The third row, however, was negative and the dotted line moves to the right one unit to (2,1). This continues until all six predictions have been examined.

Suppose a dataset had $ P$ positive rows and $ R-P$ negative rows. A perfect learner on this dataset would have an ROC curve starting at the origin, moving straight up to $ (0,P)$, and then straight right to end at $ (R-P, P)$. The solid line in Figure 5.2 illustrates the path of a perfect learner in our example with six predictions. Random guessing would produce, on average, an ROC curve which started at the origin and moved directly to the termination point $ (R-P, P)$. Note that all ROC curves will start at the origin and end at $ (R-P, P)$ because $ R$ steps up or right must be taken, one for each row.

As a summary of an ROC curve, we measure the area under the curve relative to area under a perfect learner's curve. The result is denoted AUC . A perfect learner has an AUC of 1.0, while random guessing produces an AUC of 0.5. In the example shown in Figure 5.2, the dotted line representing the real learner encloses an area of 5. The solid line for the perfect learner has an area of 8. Therefore the AUC for the real learner in our example is 5/8.

Whereas metrics such as precision and recall measure true positives and negatives, the AUC measures the ability of the classifier to correctly rank test points. This is very important for data mining. We often want to discover the most interesting galaxies, or the most promising drugs, or the products most likely to fail. When presenting results between several classification algorithms, we will compute confidence intervals on AUC scores. For this we compute one AUC score for each fold of our 10-fold cross-validation, and report the mean and a 95% confidence interval using a T distribution.


next up previous contents
Next: 5.1.5 Computing Platform Up: 5.1 Preliminaries Previous: 5.1.3.3 Synthetic Datasets   Contents
Copyright 2004 Paul Komarek, komarek@cmu.edu