[7263] in Kerberos
Information Exploration Shootout
daemon@ATHENA.MIT.EDU (Patrick Hoffman)
Mon May 13 15:13:26 1996
To: kerberos@MIT.EDU
Date: 13 May 1996 18:00:11 GMT
From: phoffman@jupiter.cs.uml.edu (Patrick Hoffman)
Information Exploration Shootout (aka - Information Exploration Benchmarks)
Over the past year many users have requested more serious comparative
evaluations of the various data exploration techniques: analysis, knowledge
discovery and data mining, statistics and grand tours, database tools,
visualization, or combinations thereof.
We all recognize that mining for information and knowledge from large
databases and documents will be the next fundamental impact in database
systems, knowledge discovery, and visualization. This is considered an
important area for major cost savings and potential revenue, and it has
immediate applications in decision systems, intelligence, information
management, business, and communication-in the form of both on-line services
and the World Wide Web. Data mining now draws from fields including
databases, statistics, information technology, data visualization, and
artificial intelligence, especially machine learning and knowledge-based
systems. There is a clear sense that, to achieve the next increase
in knowledge exploitation, individual data exploration approaches must work
together.
There have been promising developments. In 1995 a "shootout" was held for
the statistical community. The knowledge discovery in databases (KDD)
community has meanwhile made numerous data sets publicly available for timing
"benchmarks". There has not, however, been any comparative evaluation of
techniques across domains-and definitely none permitting hybrid approaches.
How does one discover information and knowledge in datasets-e.g., databases,
archives, document collections, television news reports, the Web? What
process do analysts and other data explorers use in discovering non-trivial
patterns? How do, or should, knowledge discovery, statistics, and
visualization work together to support the human exploration process? What
are the procedures for using visualization and analytic agents,in context
with the human operator, to achieve timely, computationally responsive
discoveries in data?
There is now a plethora of techniques to explore data. They range from
purely statistical approaches to neural networks, machine learning, and
knowledge discovery as batch processes. Integrated approaches use applied
perception (e.g., glyphs) with interactive grand tours, and purely geometric
systems such as parallel coordinates that, integrating little mathematics,
rely more on human participation. Which techniques are better? Which work
on what kind of data sets? Are certain combinations better? The questions
abound.
Several datasets have been identified and selected to be made publicly
available for exploration and discovery. The first dataset to be released
consists of human generated network intrusion attempts and a baseline
dataset with no intrusions. There were 4 intrusions over a period of time
and these have been tracked in separate datasets. Information explorers are
to discover these intrusions.
The second dataset to be released shortly thereafter will consist of
newspaper data set up as a collection of web pages.
GO to http://iris.cs.uml.edu:8080 for further information and more details.
The results will be reviewed by a group that includes Georges Grinstein
(UMass Lowell and the MITRE Corporation), Gregory Piatetsky-Shapiro (GTE) and
Graham Wills (AT&T). The panelists and domain experts will discuss the data
sets and the reporting and selection mechanism, and will present their
preliminary analysis of the results at various conferences which include
KDD'96, the 9th Annual DAMA-NCR Data Management Symposium, and
IEEE Visualization'96 Conference. Credit to the participants will be
provided as well as copies of the results and various documents.
Any questions or comments can be emailed to:
grinstein@cs.uml.edu or ggg@mitre.org
or below
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Patrick Hoffman Institute for Visualization and
phoffman@cs.uml.edu Perception Research
University of Massachusetts-Lowell
(508)657-8878 Computer Science Department
(508)934-3384 One University Ave.,Lowell, MA 01854
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~