Clustering Categorical Data: An Approach Based on Dynamical Systems.

David Gibson, Jon M. Kleinberg, Prabhakar Raghavan: Clustering Categorical Data: An Approach Based on Dynamical Systems. VLDB J. 8(3-4): 222-236(2000)
  author    = {David Gibson and
               Jon M. Kleinberg and
               Prabhakar Raghavan},
  title     = {Clustering Categorical Data: An Approach Based on Dynamical Systems},
  journal   = {VLDB J.},
  volume    = {8},
  number    = {3-4},
  year      = {2000},
  pages     = {222-236},
  ee        = {db/journals/vldb/GibsonKR00.html},
  bibsource = {DBLP,}


We describe a novel approach for clustering collections of sets, and its application to the analysis and mining of categorical data. By "categorical data," we mean tables with fields that cannot be naturally ordered by a metric - e.g., the names of producers of automobiles, or the names of products offered by a manufacturer. Our approach is based on an iterative method for assigning and propagating weights on the categorical values in a table; this facilitates a type of similarity measure arising from the co-occurrence of values in the dataset. Our techniques can be studied analytically in terms of certain types of non-linear dynamical systems.

Key Words

Clustering - Data mining - Categorial data - Dynamical systems - Hypergraphs

Copyright © 2000 by Springer, Berlin, Heidelberg. Permission to make digital or hard copies of the abstract is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice along with the full citation.

Online Edition (Springer)

Citation Page

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 5 Issue 2, JACM, VLDB-J, POS, ..." and ... DVD Version: Load ACM SIGMOD Anthology DVD 2" and ... BibTeX


Rakesh Agrawal, Heikki Mannila, Ramakrishnan Srikant, Hannu Toivonen, A. Inkeri Verkamo: Fast Discovery of Association Rules. Advances in Knowledge Discovery and Data Mining 1996: 307-328 BibTeX
Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami: Mining Association Rules between Sets of Items in Large Databases. SIGMOD Conference 1993: 207-216 BibTeX
Rakesh Agrawal, Ramakrishnan Srikant: Fast Algorithms for Mining Association Rules in Large Databases. VLDB 1994: 487-499 BibTeX
Noga Alon, Joel Spencer: The Probabilistic Method. John Wiley 1992, ISBN 0-471-53588-5
Contents BibTeX
Avrim Blum, Joel Spencer: Coloring Random and Semi-Random k-Colorable Graphs. J. Algorithms 19(2): 204-234(1995) BibTeX
Ravi B. Boppana: Eigenvalues and Graph Bisection: An Average-Case Analysis (Extended Abstract). FOCS 1987: 280-285 BibTeX
Sergey Brin, Rajeev Motwani, Jeffrey D. Ullman, Shalom Tsur: Dynamic Itemset Counting and Implication Rules for Market Basket Data. SIGMOD Conference 1997: 255-264 BibTeX
Sergey Brin, Rajeev Motwani, Craig Silverstein: Beyond Market Baskets: Generalizing Association Rules to Correlations. SIGMOD Conference 1997: 265-276 BibTeX
Sergey Brin, Lawrence Page: The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks 30(1-7): 107-117(1998) BibTeX
Tzi-cker Chiueh: Content-Based Image Indexing. VLDB 1994: 582-593 BibTeX
Gautam Das, Heikki Mannila, Pirjo Ronkainen: Similarity of Attributes by External Probes. KDD 1998: 23-29 BibTeX
Scott C. Deerwester, Susan T. Dumais, Thomas K. Landauer, George W. Furnas, Richard A. Harshman: Indexing by Latent Semantic Analysis. JASIS 41(6): 391-407(1990) BibTeX
Myron Flickner, Harpreet S. Sawhney, Jonathan Ashley, Qian Huang, Byron Dom, Monika Gorkani, Jim Hafner, Denis Lee, Dragutin Petkovic, David Steele, Peter Yanker: Query by Image and Video Content: The QBIC System. IEEE Computer 28(9): 23-32(1995) BibTeX
M. R. Garey, David S. Johnson: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman 1979, ISBN 0-7167-1044-7
Eui-Hong Han, George Karypis, Vipin Kumar, Bamshad Mobasher: Clustering Based On Association Rule Hypergraphs. DMKD 1997: 0- BibTeX
Zhexue Huang: A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining. DMKD 1997: 0- BibTeX
Jon M. Kleinberg: Authoritative Sources in a Hyperlinked Environment. J. ACM 46(5): 604-632(1999) BibTeX
Heikki Mannila, Hannu Toivonen, A. Inkeri Verkamo: Discovering Frequent Episodes in Sequences. KDD 1995: 210-215 BibTeX
Daniel A. Spielman, Shang-Hua Teng: Spectral Partitioning Works: Planar Graphs and Finite Element Meshes. FOCS 1996: 96-105 BibTeX
Ramakrishnan Srikant, Rakesh Agrawal: Mining Generalized Association Rules. VLDB 1995: 407-419 BibTeX
Hannu Toivonen: Sampling Large Databases for Association Rules. VLDB 1996: 134-145 BibTeX
Tian Zhang, Raghu Ramakrishnan, Miron Livny: BIRCH: An Efficient Data Clustering Method for Very Large Databases. SIGMOD Conference 1996: 103-114 BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
VLDB Journal: 1992-1995 Copyright © by VLDB Endowment / 1996-... Copyright © by Springer Verlag,
ACM SIGMOD Anthology: Copyright © by ACM (, Corrections:
DBLP: Copyright © by Michael Ley (, last change: Sun May 17 00:31:37 2009