ACM SIGMOD Anthology ACM SIGMOD dblp.uni-trier.de

Efficient Parallel and Data Mining for Association Rules.

Jong Soo Park, Ming-Syan Chen, Philip S. Yu: Efficient Parallel and Data Mining for Association Rules. CIKM 1995: 31-36
@inproceedings{DBLP:conf/cikm/ParkCY95,
  author    = {Jong Soo Park and
               Ming-Syan Chen and
               Philip S. Yu},
  title     = {Efficient Parallel and Data Mining for Association Rules},
  booktitle = {CIKM '95, Proceedings of the 1995 International Conference on
               Information and Knowledge Management, November 28 - December
               2, 1995, Baltimore, Maryland, USA},
  publisher = {ACM},
  year      = {1995},
  pages     = {31-36},
  ee        = {db/conf/cikm/ParkCY95.html, http://doi.acm.org/10.1145/221270.221320},
  crossref  = {DBLP:conf/cikm/95},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

In this paper, we develop an algorithm, called PDM, to conduct parallel data mining for association rules. Consider a transaction as a collection of items, and a large itemset is a set of items such that the number of transactions containing it exceeds a pre-specilied threshold. PDM is so designed that the global set of large itemsets can be identified efficiently and the amount of inter-node data exchange required is minimized. SpecificaUy, with a given database partition, each processing node will collect (count) information on each itemset from its local database efficiently via a hashing method. The information discovered by each node is next shared with other nodes via some communication schemes. Then, PDM employs a technique, called clue-and-poll, to address the uncertainty due to the partial knowledge collected at each node by judiciously selecting a small fraction of the itemsets for the exchange of count information among nodes, thus reducing the communication cost. The global set of large iternsets can hence be determined based on the aggregate count of itemsets. It is experimentally shown that PDM not only attains very good parallelization efficiencies, but also provides robust performance for various input patterns.

Copyright © 1995 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 2 Issue 4, CIKM, DOLAP, GIS, SIGFIDET, ..." and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

CIKM '95, Proceedings of the 1995 International Conference on Information and Knowledge Management, November 28 - December 2, 1995, Baltimore, Maryland, USA. ACM 1995
Contents BibTeX

Online Edition

Citation Page BibTeX

References

[1]
Rakesh Agrawal, Christos Faloutsos, Arun N. Swami: Efficient Similarity Search In Sequence Databases. FODO 1993: 69-84 BibTeX
[2]
Rakesh Agrawal, Sakti P. Ghosh, Tomasz Imielinski, Balakrishna R. Iyer, Arun N. Swami: An Interval Classifier for Database Mining Applications. VLDB 1992: 560-573 BibTeX
[3]
Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami: Mining Association Rules between Sets of Items in Large Databases. SIGMOD Conference 1993: 207-216 BibTeX
[4]
Rakesh Agrawal, Ramakrishnan Srikant: Mining Sequential Patterns. ICDE 1995: 3-14 BibTeX
[5]
Rakesh Agrawal, Ramakrishnan Srikant: Fast Algorithms for Mining Association Rules in Large Databases. VLDB 1994: 487-499 BibTeX
[6]
...
[7]
Jiawei Han, Yandong Cai, Nick Cercone: Knowledge Discovery in Databases: An Attribute-Oriented Approach. VLDB 1992: 547-559 BibTeX
[8]
Raymond T. Ng, Jiawei Han: Efficient and Effective Clustering Methods for Spatial Data Mining. VLDB 1994: 144-155 BibTeX
[9]
...
[10]
Jong Soo Park, Ming-Syan Chen, Philip S. Yu: An Effective Hash Based Algorithm for Mining Association Rules. SIGMOD Conference 1995: 175-186 BibTeX

Referenced by

  1. Masahisa Tamura, Masaru Kitsuregawa: Dynamic Load Balancing for Parallel Association Rule Mining on Heterogenous PC Cluster Systems. VLDB 1999: 162-173
  2. Charu C. Aggarwal, Philip S. Yu: Mining Large Itemsets for Association Rules. IEEE Data Eng. Bull. 21(1): 23-31(1998)
  3. Takahiko Shintani, Masaru Kitsuregawa: Parallel Mining Algorithms for Generalized Association Rules with Classification Hierarchy. SIGMOD Conference 1998: 25-36
  4. David Wai-Lok Cheung, Sau Dan Lee, Ben Kao: A General Incremental Technique for Maintaining Discovered Association Rules. DASFAA 1997: 185-194
  5. David Wai-Lok Cheung, Vincent T. Y. Ng, Ada Wai-Chee Fu, Yongjian Fu: Efficient Mining of Association Rules in Distributed Databases. IEEE Trans. Knowl. Data Eng. 8(6): 911-922(1996)
  6. Ming-Syan Chen, Jiawei Han, Philip S. Yu: Data Mining: An Overview from a Database Perspective. IEEE Trans. Knowl. Data Eng. 8(6): 866-883(1996)
  7. Rakesh Agrawal, John C. Shafer: Parallel Mining of Association Rules. IEEE Trans. Knowl. Data Eng. 8(6): 962-969(1996)
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
CIKM 1995 Proceedings, ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:01:47 2009