Next: B.4.6 oldknn
Up: B.4 Learners
Previous: B.4.4 lr_cgmle
Contents
B.4.5 newknn
This is an implementation of a fast Ball-tree based Classifier
with binary-valued input attributes. We rely on the fact that these two
questions are not the same: (a)``Find me the
nearest neighbors.''
and (b) ``How many of the
-nearest neighbors are in the positive class?''
Answering (b) exactly does not necessarily require us to answer (a).
newknn attacks the problem by building two ball trees:
one ball tree is built
from all the positive points in the dataset, another ball tree is built from
all negative points. Then it is time to classify a new target point
t,
we compute
, the number of k nearest neighbors of t that are
in the positive class, in the following fashion:
We expect newknn can achieve a good speed up for the
dataset with much
more negative output than positive ones.
| Keyword |
Arg Type |
Arg Vals |
Default |
| Common |
| k |
int |
[0, ) |
9 |
| rmin |
int |
[1, ) |
5 |
| mbw |
float |
[1e-7, ) |
0.01 |
|
|
|
|
Common keywords and arguments:
- k int: number of nearest neighbor you are
interested.
- rmin int: the maximum number of points that
will be allowed in a tree node. Sometimes it saves memory to use
rmin
1. It even seems to speed things up, possibly because of
L1/L2 cache effects.
- mbw float:"min_ball_width" = a node will
not be split if it finds that all its points are within distance
"mbw" of the node centroid. It will not be split (and will hence
be a leaf node).
Next: B.4.6 oldknn
Up: B.4 Learners
Previous: B.4.4 lr_cgmle
Contents
Copyright 2004 Paul Komarek, komarek@cmu.edu