Next: B.4.2 dtree
Up: B.4 Learners
Previous: B.4 Learners
Contents
B.4.1 bc
This is an implementation of a Naive Bayesian
Classifier [28,5] with binary-valued inputs
attributes (all input values must be zero or one). The implementation
has been optimized for speed and accuracy on very high dimensional
sparse data.
where
 |
(B.4) |
under the (dubious) assumption that the input attributes
are conditionally independent of each other given a known activity level,
and where
 |
(B.5) |
Learning is trivial. From the dataset,
is simply estimated as
the fraction of records in which
ACT. And
is estimated as
 |
(B.6) |
Because of the vast numbers of attributes and because of the need to
exploit sparseness, a number of computational tricks are used, including
a method to permit all probabilities to be evaluated in log space, and
a subtraction trick to avoid needing to ever explicitly iterate over
elements of a record in who have a value of zero.
BC needs no keywords or arguments.
Next: B.4.2 dtree
Up: B.4 Learners
Previous: B.4 Learners
Contents
Copyright 2004 Paul Komarek, komarek@cmu.edu