[7263] in Kerberos

home help back first fref pref prev next nref lref last post

Information Exploration Shootout

daemon@ATHENA.MIT.EDU (Patrick Hoffman)
Mon May 13 15:13:26 1996

To: kerberos@MIT.EDU
Date: 13 May 1996 18:00:11 GMT
From: phoffman@jupiter.cs.uml.edu (Patrick Hoffman)



Information Exploration Shootout (aka - Information Exploration Benchmarks)

Over the past year many users have requested more serious comparative 
evaluations of the various data exploration techniques:  analysis, knowledge 
discovery and data mining, statistics and grand tours, database tools, 
visualization, or combinations thereof.  

We all recognize that mining for information and knowledge from large 
databases and documents will be the next fundamental impact in database 
systems, knowledge discovery, and visualization.  This is considered an 
important area for major cost savings and potential revenue, and it has 
immediate applications in decision systems, intelligence, information 
management, business, and communication-in the form of both on-line services
and the World Wide Web.  Data mining now draws from fields including 
databases, statistics, information technology, data visualization, and 
artificial intelligence, especially machine learning and knowledge-based 
systems.  There is a clear sense that, to achieve the next increase 
in knowledge exploitation, individual data exploration approaches must work 
together.  

There have been promising developments.  In 1995 a "shootout" was held for 
the statistical community.  The knowledge discovery in databases (KDD) 
community has meanwhile made numerous data sets publicly available for timing 
"benchmarks".  There has not, however, been any comparative evaluation of 
techniques across domains-and definitely none permitting hybrid approaches.  

How does one discover information and knowledge in datasets-e.g., databases, 
archives, document collections, television news reports, the Web?  What 
process do analysts and other data explorers use in discovering non-trivial 
patterns?  How do, or should, knowledge discovery, statistics, and 
visualization work together to support the human exploration process?  What 
are the procedures for using visualization and analytic agents,in context 
with the human operator, to achieve timely, computationally responsive 
discoveries in data?

There is now a plethora of techniques to explore data.  They range from 
purely statistical approaches to neural networks, machine learning, and 
knowledge discovery as batch processes.  Integrated approaches use applied
perception (e.g., glyphs) with interactive grand tours, and purely geometric 
systems such as parallel coordinates that, integrating little mathematics, 
rely more on human participation.  Which techniques are better?  Which work 
on what kind of data sets?  Are certain combinations better?  The questions 
abound.  

Several datasets have been identified and selected to be made publicly 
available for exploration and discovery. The first dataset to be released 
consists of human generated network intrusion attempts and a baseline 
dataset with no intrusions.  There were 4 intrusions over a period of time 
and these have been tracked in separate datasets.  Information explorers are 
to discover these intrusions.

The second dataset to be released shortly thereafter will consist of 
newspaper data set up as a collection of web pages.

GO to http://iris.cs.uml.edu:8080 for further information and more details.

The results will be reviewed by a group that includes Georges Grinstein 
(UMass Lowell and the MITRE Corporation), Gregory Piatetsky-Shapiro (GTE) and 
Graham Wills (AT&T).  The panelists and domain experts will discuss the data 
sets and the reporting and selection mechanism, and will  present their 
preliminary analysis of the results at various conferences which include 
KDD'96, the 9th Annual DAMA-NCR Data Management Symposium, and 
IEEE Visualization'96 Conference.  Credit to the participants will be 
provided as well as copies of the results and various documents.

       Any questions or comments can be emailed to:
                  grinstein@cs.uml.edu or ggg@mitre.org
		         or below
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Patrick Hoffman      Institute for Visualization and 
phoffman@cs.uml.edu  Perception Research
                     University of Massachusetts-Lowell
(508)657-8878        Computer Science Department
(508)934-3384        One University Ave.,Lowell, MA 01854
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



home help back first fref pref prev next nref lref last post