next up previous contents
Next: 1.2 Focus Of Thesis Up: 1. Introduction Previous: 1. Introduction   Contents

1.1 Science In This Century

Modern researchers are facing an explosion in data. This data comes from immense investment in automated terrestrial and space-based telescopes, roboticized chemistry as used in the life sciences, network-based content delivery, and the computerization of business processes to name just a few sources. Several such datasets are described and used later in this thesis. Only through the creation of highly scalable analytic methods can we make the discoveries that justify this immense investment in data collection.

We believe these methods will rely fundamentally on algorithms and data structures for high-performance computational statistics. These modular building blocks would be used to form larger ``discovery systems'' functioning as industrious, intelligent assistants that autonomously identify and summarize interesting data for a researcher; adapt to the researcher's definition of interesting; and test the researcher's hypotheses about the data.

For a discovery system to be genuinely useful to the research community, we believe it should make analyses as quickly as a scientist can formulate queries and describe their hypotheses. This requires scalable data structures and algorithms capable of analyzing millions of data points with tens or tens-of-thousands of dimensions in seconds on modern computational hardware. The ultimate purpose of a discovery system is to allow researchers to concentrate on their research, rather than computer science.


next up previous contents
Next: 1.2 Focus Of Thesis Up: 1. Introduction Previous: 1. Introduction   Contents
Copyright 2004 Paul Komarek, komarek@cmu.edu