Fast Computation of Sparse Datacubes.

Kenneth A. Ross, Divesh Srivastava: Fast Computation of Sparse Datacubes. VLDB 1997: 116-125
  author    = {Kenneth A. Ross and
               Divesh Srivastava},
  editor    = {Matthias Jarke and
               Michael J. Carey and
               Klaus R. Dittrich and
               Frederick H. Lochovsky and
               Pericles Loucopoulos and
               Manfred A. Jeusfeld},
  title     = {Fast Computation of Sparse Datacubes},
  booktitle = {VLDB'97, Proceedings of 23rd International Conference on Very
               Large Data Bases, August 25-29, 1997, Athens, Greece},
  publisher = {Morgan Kaufmann},
  year      = {1997},
  isbn      = {1-55860-470-7},
  pages     = {116-125},
  ee        = {db/conf/vldb/RossS97.html},
  crossref  = {DBLP:conf/vldb/97},
  bibsource = {DBLP,}


Datacube queries compute aggregates over database relations at a variety of granularities, and they constitute an important class of decision support queries. Real-world data is frequently sparse, and hence efficiently computing datacubes over large sparse relations is important. We show that current techniques for computing datacubes over sparse relations do not scale well with the number of CUBE BY attributes, especially when the relation is much larger than main memory.

We propose a novel algorithm for the fast computation of datacubes over sparse relations, and demonstrate the efficiency of our algorithm using synthetic, benchmark and real-world data sets. When the relation fits in memory, our technique performs multiple in-memory sorts, and does not incur any I/O beyond the input of the relation and the output of the datacube itself. When the relation does not fit in memory, a divide-and-conquer strategy divides the problem of computing the datacube into several simpler computations of sub-datacubes. Often, all but one of the sub-datacubes can be computed in memory and our in-memory solution applies. In that case, the total I/O overhead is linear in the number of CUBE BY attributes. We demonstrate with an implementation that the CPU cost of our algorithm is dominated by the I/O cost for sparse relations.

Copyright © 1997 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.

Online Paper

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

Matthias Jarke, Michael J. Carey, Klaus R. Dittrich, Frederick H. Lochovsky, Pericles Loucopoulos, Manfred A. Jeusfeld (Eds.): VLDB'97, Proceedings of 23rd International Conference on Very Large Data Bases, August 25-29, 1997, Athens, Greece. Morgan Kaufmann 1997, ISBN 1-55860-470-7
Contents BibTeX

Electronic Edition

From CS Dept., University Trier (Germany)


Sameet Agarwal, Rakesh Agrawal, Prasad Deshpande, Ashish Gupta, Jeffrey F. Naughton, Raghu Ramakrishnan, Sunita Sarawagi: On the Computation of Multidimensional Aggregates. VLDB 1996: 506-521 BibTeX
Jim Gray, Adam Bosworth, Andrew Layman, Hamid Pirahesh: Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total. ICDE 1996: 152-159 BibTeX
C. A. R. Hoare: Quicksort. Comput. J. 5(1): 10-15(1962) BibTeX
Venky Harinarayan, Anand Rajaraman, Jeffrey D. Ullman: Implementing Data Cubes Efficiently. SIGMOD Conference 1996: 205-216 BibTeX
Yihong Zhao, Prasad Deshpande, Jeffrey F. Naughton: An Array-Based Algorithm for Simultaneous Multidimensional Aggregates. SIGMOD Conference 1997: 159-170 BibTeX

Referenced by

  1. Frank K. H. A. Dehne, Todd Eavis, Susanne E. Hambrusch, Andrew Rau-Chaplin: Parallelizing the Data Cube. ICDT 2001: 129-143
  2. Weifa Liang, Maria E. Orlowska, Jeffrey Xu Yu: Optimizing Multiple Dimensional Queries Simultaneously in Multidimensional Databases. VLDB J. 8(3-4): 319-338(2000)
  3. Themistoklis Palpanas: Knowledge Discovery in Data Warehouses. SIGMOD Record 29(3): 88-100(2000)
  4. Jiawei Han: Review - An Array-Based Algorithm for Simultaneous Multidimensional Aggregates. ACM SIGMOD Digital Review 1: (1999)
  5. H. V. Jagadish, Laks V. S. Lakshmanan, Divesh Srivastava: What can Hierarchies do for Data Warehouses? VLDB 1999: 530-541
  6. Damianos Chatziantoniou: Evaluation of Ad Hoc OLAP: In-Place Computation. SSDBM 1999: 34-43
  7. Kevin S. Beyer, Raghu Ramakrishnan: Bottom-Up Computation of Sparse and Iceberg CUBEs. SIGMOD Conference 1999: 359-370
  8. Seigo Muto, Masaru Kitsuregawa: A Dynamic Load Balancing Strategy for Parallel Datacube Computation. DOLAP 1999: 67-72
  9. Shih-Fu Chang, Luis Gravano, Gail E. Kaiser, Kenneth A. Ross, Salvatore J. Stolfo: Database Research at Columbia University. SIGMOD Record 27(3): 75-80(1998)
  10. Amit Shukla, Prasad Deshpande, Jeffrey F. Naughton: Materialized View Selection for Multidimensional Datasets. VLDB 1998: 488-499
  11. John R. Smith, Chung-Sheng Li, Vittorio Castelli, Anant Jhingran: Dynamic Assembly of Views in Data Cubes. PODS 1998: 274-283
  12. Kenneth A. Ross, Divesh Srivastava, Damianos Chatziantoniou: Complex Aggregation at Multiple Granularities. EDBT 1998: 263-277
  13. Seigo Muto, Masaru Kitsuregawa: Improving Main Memory Utilization for Array-Based DataCube Computation. DOLAP 1998: 28-33
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
VLDB Proceedings: Copyright © by VLDB Endowment,
ACM SIGMOD Anthology: Copyright © by ACM (, Corrections:
DBLP: Copyright © by Michael Ley (, last change: Sat May 16 23:46:15 2009