To describe the SVM classifier, we must first make several
definitions. Let
be the set of negative rows from
, and
be the set of positive rows. Suppose that
and
are
linearly separable. A separating hyperplane is a hyperplane
with unit normal
and distance to the origin
such
that
| (6.3) | |||
| such that |
(6.4) |
For datasets with binary outcomes, we can code each outcome
as 1
and -1 instead of 1 and 0. With this coding we can unify
Equations 6.1 and 6.2 as
SVMs are often described as a maximum margin classifier because
they choose a separating hyperplane which maximizes
under some
constraint on the number and size of separation violations. A precise
definition of an SVM can be written as the mathematical program
![]() |
(6.7) |
The SVM described above is a linear SVM. It may be desirable to apply a nonlinear transform to data which is not linearly separable before applying SVM. The separating hyperplane found for the transformed space becomes a nonlinear boundary in the original space. A common transformation uses Radial Basis Functions, which are simply a basis created from a properly scaled probability density function (pdf). Note that this pdf is not interpreted probabilistically. [34,10]
We did not implement SVM, and instead chose to use the popular
SVM
package, version 5 [18,16]. This
software uses a variety of tricks to find solutions to QP, defined in
Equation 6.8, more quickly than traditional quadratic
programming software.
SVM
can function as a linear or RBF SVM,
where the RBF SVM uses the Normal pdf and scale parameter gamma.
The default parameters for the linear SVM work well in our
experiments. The default value of gamma made RBF SVM experiments
too slow, and we adjusted it to gamma=0.001. Smaller values of
gamma did not improve speed further. Joachims [16]
describes the time complexity of
SVM
in a manner that we cannot
easily describe here. While this complexity is bounded above by
O
, much better performance is generally observed.
We have written a wrapper around
SVM
so that it may be timed in
the same manner as the other algorithms, as well as report scores in
the same manner as the other algorithms. Before calling
SVM
, we
must first write data to a file in a special format. When the
training phase is done, we read data from a file created by
SVM
.
A similar process happens during classification. This is repeated for
every fold. We assume that
SVM
spends the same amount of time
reading the files as our SVM wrapper spends writing them, and
vice-versa. Let the filetime be the amount of time we spend
reading and writing files. We do not want to include the filetime in
the published algorithm time. Therefore we report the algorithm time
as we normally measure it, less twice the filetime.