ACM SIGMOD Anthology VLDB dblp.uni-trier.de

Ratio Rules: A New Paradigm for Fast, Quantifiable Data Mining.

Flip Korn, Alexandros Labrinidis, Yannis Kotidis, Christos Faloutsos: Ratio Rules: A New Paradigm for Fast, Quantifiable Data Mining. VLDB 1998: 582-593
@inproceedings{DBLP:conf/vldb/KornLKF98,
  author    = {Flip Korn and
               Alexandros Labrinidis and
               Yannis Kotidis and
               Christos Faloutsos},
  editor    = {Ashish Gupta and
               Oded Shmueli and
               Jennifer Widom},
  title     = {Ratio Rules: A New Paradigm for Fast, Quantifiable Data Mining},
  booktitle = {VLDB'98, Proceedings of 24rd International Conference on Very
               Large Data Bases, August 24-27, 1998, New York City, New York,
               USA},
  publisher = {Morgan Kaufmann},
  year      = {1998},
  isbn      = {1-55860-566-5},
  pages     = {582-593},
  ee        = {db/conf/vldb/KornLKF98.html},
  crossref  = {DBLP:conf/vldb/98},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

Association Rule Mining algorithms operate on a data matrix (e.g., customers × products) to derive association rules [2, 23]. We propose a new paradigm, namely, Ratio Rules, which are quantifiable in that we can measure the "goodness" of a set of discovered rules. We propose to use the "guessing error" as a measure of the "goodness", that is, the root- mean-square error of the reconstructed values of the cellsof the given matrix, when we pretend that they are unknown. Another contribution is a novel method to guess missing/hidden values fromthe Ratio Rules that our method derives. For example, if somebody bought $10 of milk and $3 of bread, our rules can"guess" the amount spent on, say, butter. Thus, we can perform a variety of important tasks such as forecasting, answering "what-if" scenarios, detecting outliers, and visualizing the data. Moreover, we show how to compute Ratio Rules in a single pass over the dataset with small memory requirements (a few small matrices), in contrast to traditional association rule mining methods that require multiple passes and/or large memory. Experiments on several real datasets (e.g., basketball and baseball statistics, biological data) demonstrate that the proposed method consistently achieves a "guessing error" of up to 5 times less than the straightforward competitor.

Copyright © 1998 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


BibTeX

Printed Edition

Ashish Gupta, Oded Shmueli, Jennifer Widom (Eds.): VLDB'98, Proceedings of 24rd International Conference on Very Large Data Bases, August 24-27, 1998, New York City, New York, USA. Morgan Kaufmann 1998, ISBN 1-55860-566-5
Contents BibTeX

References

[1]
Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami: Database Mining: A Performance Perspective. IEEE Trans. Knowl. Data Eng. 5(6): 914-925(1993) BibTeX
[2]
Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami: Mining Association Rules between Sets of Items in Large Databases. SIGMOD Conference 1993: 207-216 BibTeX
[3]
Rakesh Agrawal, John C. Shafer: Parallel Mining of Association Rules. IEEE Trans. Knowl. Data Eng. 8(6): 962-969(1996) BibTeX
[4]
Rakesh Agrawal, Ramakrishnan Srikant: Fast Algorithms for Mining Association Rules in Large Databases. VLDB 1994: 487-499 BibTeX
[5]
Andreas Arning, Rakesh Agrawal, Prabhakar Raghavan: A Linear Method for Deviation Detection in Large Databases. KDD 1996: 164-169 BibTeX
[6]
...
[7]
Sergey Brin, Rajeev Motwani, Craig Silverstein: Beyond Market Baskets: Generalizing Association Rules to Correlations. SIGMOD Conference 1997: 265-276 BibTeX
[8]
...
[9]
Ming-Syan Chen, Jiawei Han, Philip S. Yu: Data Mining: An Overview from a Database Perspective. IEEE Trans. Knowl. Data Eng. 8(6): 866-883(1996) BibTeX
[10]
...
[11]
Usama M. Fayyad, Ramasamy Uthurusamy: Data Mining and Knowledge Discovery in Databases (Introduction to the Special Section). Commun. ACM 39(11): 24-26(1996) BibTeX
[12]
Peter W. Foltz, Susan T. Dumais: Personalized Information Delivery: An Analysis of Information Filtering Methods. Commun. ACM 35(12): 51-60(1992) BibTeX
[13]
Jiawei Han, Yongjian Fu: Discovery of Multiple-Level Association Rules from Large Databases. VLDB 1995: 420-431 BibTeX
[14]
Wen-Chi Hou: Extraction and Applications of Statistical Relationships in Relational Databases. IEEE Trans. Knowl. Data Eng. 8(6): 939-945(1996) BibTeX
[15]
...
[16]
Jong Soo Park, Ming-Syan Chen, Philip S. Yu: An Effective Hash Based Algorithm for Mining Association Rules. SIGMOD Conference 1995: 175-186 BibTeX
[17]
William H. Press, Saul A. Teukolsky, William T. Vetterling, Brian P. Flannery: Numerical Recipes in C, 2nd Edition. Cambridge University Press 1992
Contents BibTeX
[18]
J. Ross Quinlan: C4.5: Programs for Machine Learning. Morgan Kaufmann 1993, ISBN 1-55860-238-0
BibTeX
[19]
Gerard Salton, Michael McGill: Introduction to Modern Information Retrieval. McGraw-Hill Book Company 1984, ISBN 0-07-054484-0
BibTeX
[20]
Ashok Savasere, Edward Omiecinski, Shamkant B. Navathe: An Efficient Algorithm for Mining Association Rules in Large Databases. VLDB 1995: 432-444 BibTeX
[21]
Abraham Silberschatz, Alexander Tuzhilin: What Makes Patterns Interesting in Knowledge Discovery Systems. IEEE Trans. Knowl. Data Eng. 8(6): 970-974(1996) BibTeX
[22]
Ramakrishnan Srikant, Rakesh Agrawal: Mining Generalized Association Rules. VLDB 1995: 407-419 BibTeX
[23]
Ramakrishnan Srikant, Rakesh Agrawal: Mining Quantitative Association Rules in Large Relational Tables. SIGMOD Conference 1996: 1-12 BibTeX

Referenced by

  1. Erik Riedel, Christos Faloutsos, Gregory R. Ganger, David Nagle: Data Mining on an OLTP System (Nearly) for Free. SIGMOD Conference 2000: 13-21
  2. Laks V. S. Lakshmanan, Raymond T. Ng, Jiawei Han, Alex Pang: Optimization of Constrained Frequent Set Queries with 2-variable Constraints. SIGMOD Conference 1999: 157-168
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
VLDB Proceedings: Copyright © by VLDB Endowment,
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:46:23 2009