Free Parallel Data Mining
Bin Li, Dennis Shasha
Full Paper (PDF)

Demonstration (HTML)

Data mining is computationally expensive. Since the benefits of data mining results are unpredictable, organizations may not be willing to buy new hardware for that purpose. We will present a system that enables data mining applications to run in parallel on networks of workstations in a fault-tolerant manner. We will describe our parallelization of a combinatorial pattern discovery algorithm and a classification tree algorithm. We will demonstrate the effectiveness of our system with two real applications: discovering active motifs in protein sequences and predicting foreign exchange rate movement.


References, where available, link to the DBLP on the World Wide Web.

Nicholas Carriero, David Gelernter: Linda in Context. CACM 32(4): 444-458(1989)
J. Ross Quinlan: C4.5: Programs for Machine Learning. Morgan Kaufmann 1993, ISBN 1-55860-238-0
Jason Tsong-Li Wang, Gung-Wei Chirn, Thomas G. Marr, Bruce A. Shapiro, Dennis Shasha, Kaizhong Zhang: Combinatorial Pattern Discovery for Scientific Data: Some Preliminary Results. SIGMOD Conference 1994: 115-125

author = {Bin Li and
Dennis Shasha},
editor = {Laura M. Haas and
Ashutosh Tiwary},
title = {Free Parallel Data Mining},
booktitle = {SIGMOD 1998, Proceedings ACM SIGMOD International Conference
on Management of Data, June 2-4, 1998, Seattle, Washington, USA},
publisher = {ACM Press},
year = {1998},
isbn = {0-89791-955-5},
pages = {541-543},
crossref = {DBLP:conf/sigmod/98},
bibsource = {DBLP,}

DBLP: Copyright ©1999 by Michael Ley (