next up previous contents
Next: 6.2 Synthetic Datasets Up: 6.1 Classifiers Previous: 6.1.2 KNN   Contents


6.1.3 BC

Bayes' classifier, also known as Naive Bayes, is a simple application of Bayes' rule to allow computation of the probability that row $ i$ is a positive experiment. If we assume that the attributes are conditionally independent given the outcome, that is $ P(x_{ij}\vert x_{ik},y_i) = P(x_{ij}\vert y_i), j \ne k$, then we may apply Bayes' rule to compute

$\displaystyle P(Y=1\vert\ensuremath{\mathbf{X}},\ensuremath{\mathbf{y}},\tilde{...
...{\prod P(\tilde{x_j} \vert Y=0) P(Y=0) + \prod P(\tilde{x_j} \vert Y=1) P(Y=1)}$ (6.9)

where $ \tilde{\ensuremath{\mathbf{x}}}$ is the experiment for which we wish to predict the outcome, and $ P(Y=1)$ and $ P(\tilde{x_j} \vert Y)$ are estimated using the dataset $ \ensuremath{\mathbf{X}},\ensuremath{\mathbf{y}}$. Thus BC has no parameters. Another approach to BC allows the class prior probabilities $ P(Y=1)$ and $ P(Y=0)$ to be chosen by the experimenter. However, changing the class prior distribution does not change the rank-ordering of the predictions, and hence does not affect the AUC score.

There are two principal problems with BC. The first and less important problem is that the attributes are rarely conditionally independent, making the predictions suspect. The second problem is that BC is a generative classifier. A generative classifier is one which attempts to learn the joint probability $ P(Y,\ensuremath{\mathbf{x}})$ and makes predictions by forming the conditional distribution $ P(Y\vert\ensuremath{\mathbf{x}})$. This indirect approach contrasts with that of a discriminative classifier, such as LR, which estimates $ P(Y,\ensuremath{\mathbf{x}})$ directly from the data. The result is that generative classifiers are not directly optimizing the quantity of interest [33]. Russel and Norvig [39] contains a nice description of BC.

Computing $ P(Y=1)$ and $ P(\tilde{x_j} \vert Y)$ only needs to be done once, thus BC is always fast. By exploiting sparsity and using a few computation tricks, we have made our implementation of BC very fast. Our implementation is linear in the number of nonzero entries of $ \mathbf{X}$.


next up previous contents
Next: 6.2 Synthetic Datasets Up: 6.1 Classifiers Previous: 6.1.2 KNN   Contents
Copyright 2004 Paul Komarek, komarek@cmu.edu