A goal of this thesis is to provide a reasonable and repeatable baseline measurement of classification with LR. To meet this goal, we use a single computing platform for all of our experiments. This platform consists of dual AMD Opteron 242 processors with 4GB or 8GB of memory running GNU/Linux. For machines of either memory size the memory modules are split across the two processors, due to the Opteron's NUMA configuration. The BIOS is set to interleave accesses within each local memory bank, and to interleave memory accesses between each processor's local memory. This technique prevents dramatic discrepancies in process performance when the operating system kernel runs the process on one cpu but allocates memory from the other processor's local bank.
Linux kernel version 2.4.23 or later is used in 64-bit mode. Executables are compiled with the GNU C compiler version 3.2.2 or later, using the -O2 flag alone for optimization. All of our experiments fit within 4GB of ram and machines of either memory size behave identically.
There is some uncertainty in the timing of any computer process. In general we are unconcerned with variations in time below ten percent. Although it is common to use ``user'' time or ``cpu'' time to measure experiment speed, we use the ``real'' time required for computations within each fold, neglecting fold preparation and scoring overhead. When making performance comparisons, we run only one experiment per machine and the elapsed algorithm execution time should not differ significantly from the ``cpu'' time. This strategy should provide a good assessment of how long a user will wait for computations to finish.
Our test machines were purchased at two different times. A discrepancy between BIOS versions, discovered while writing the final chapter of this thesis, caused inconsistent timings for some experiments. We report over two thousand results in this thesis, and are unable to repeat all of the affected experiments. Instead, we have repeated a subset of the affected experiments such that all results in Chapter 5 are for the old, slower BIOS, and all results in Chapter 6 use the new, faster BIOS.
We use a Postgresql 7.3 database to store the programs, datasets, input and output for each experiment. This database is available for inspection of all experiments performed for this document, as well as many experiments not included here.