Recovering Information from Summary Data.

Christos Faloutsos, H. V. Jagadish, Nikolaos Sidiropoulos: Recovering Information from Summary Data. VLDB 1997: 36-45
  author    = {Christos Faloutsos and
               H. V. Jagadish and
               Nikolaos Sidiropoulos},
  editor    = {Matthias Jarke and
               Michael J. Carey and
               Klaus R. Dittrich and
               Frederick H. Lochovsky and
               Pericles Loucopoulos and
               Manfred A. Jeusfeld},
  title     = {Recovering Information from Summary Data},
  booktitle = {VLDB'97, Proceedings of 23rd International Conference on Very
               Large Data Bases, August 25-29, 1997, Athens, Greece},
  publisher = {Morgan Kaufmann},
  year      = {1997},
  isbn      = {1-55860-470-7},
  pages     = {36-45},
  ee        = {db/conf/vldb/FaloutsosJS97.html},
  crossref  = {DBLP:conf/vldb/97},
  bibsource = {DBLP,}


Data is often stored in summarized form, as a histogram of aggregates (COUNTs, SUMs, or AVeraGes) over specified ranges. We study how to estimate the original detail data from the stored summary. We formulate this task as an inverse problem, specifying a well-defined cost function that has to be optimized under constraints. We show that our formulation includes the uniformity and independence assumptions as a special case, and that it can achieve better reconstruction results if we maximize the smoothness as opposed to the uniformity.
In our experiments on real and synthetic datasets, the proposed method almost consistently outperforms its competitor, improving the root-mean-square error by up to 20 per cent for stock price data, and up to 90 per cent for smoother data sets. Finally, we show how to apply this theory to a variety of database problems that involve partial information, such as OLAP, data warehousing and histograms in query optimization.

Copyright © 1997 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.

Online Paper

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

Matthias Jarke, Michael J. Carey, Klaus R. Dittrich, Frederick H. Lochovsky, Pericles Loucopoulos, Manfred A. Jeusfeld (Eds.): VLDB'97, Proceedings of 23rd International Conference on Very Large Data Bases, August 25-29, 1997, Athens, Greece. Morgan Kaufmann 1997, ISBN 1-55860-470-7
Contents BibTeX

Electronic Edition

From CS Dept., University Trier (Germany)


Chung-Min Chen, Nick Roussopoulos: Adaptive Selectivity Estimation Using Query Feedback. SIGMOD Conference 1994: 161-172 BibTeX
Stavros Christodoulakis: Implications of Certain Assumptions in Database Performance Evaluation. ACM Trans. Database Syst. 9(2): 163-186(1984) BibTeX
Divesh Srivastava, Shaul Dar, H. V. Jagadish, Alon Y. Levy: Answering Queries with Aggregation Using Views. VLDB 1996: 318-329 BibTeX
Georg Gottlob, Roberto Zicari: Closed World Databases Opened Through Null Values. VLDB 1988: 50-61 BibTeX
Jim Gray, Adam Bosworth, Andrew Layman, Hamid Pirahesh: Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total. ICDE 1996: 152-159 BibTeX
Ashish Gupta, Venky Harinarayan, Dallan Quass: Aggregate-Query Processing in Data Warehousing Environments. VLDB 1995: 358-369 BibTeX
Ashish Gupta, Inderpal Singh Mumick, V. S. Subrahmanian: Maintaining Views Incrementally. SIGMOD Conference 1993: 157-166 BibTeX
Venky Harinarayan, Anand Rajaraman, Jeffrey D. Ullman: Implementing Data Cubes Efficiently. SIGMOD Conference 1996: 205-216 BibTeX
Tomasz Imielinski, Witold Lipski Jr.: Incomplete Information in Relational Databases. J. ACM 31(4): 761-791(1984) BibTeX
Yannis E. Ioannidis, Viswanath Poosala: Balancing Histogram Optimality and Practicality for Query Result Size Estimation. SIGMOD Conference 1995: 233-244 BibTeX
H. V. Jagadish: The INCINERATE Data Model. ACM Trans. Database Syst. 20(1): 71-110(1995) BibTeX
H. V. Jagadish, Inderpal Singh Mumick, Abraham Silberschatz: View Maintenance Issues for the Chronicle Data Model. PODS 1995: 113-124 BibTeX
David G. Luenberger: Introduction to Linear and Nonlinear Programming. Addison-Wesley 1973
Francesco M. Malvestuto: A Universal-Scheme Approach to Statistical Databases Containing Homogeneous Summary Tables. ACM Trans. Database Syst. 18(4): 678-708(1993) BibTeX
M. Muralikrishna, David J. DeWitt: Equi-Depth Histograms For Estimating Selectivity Factors For Multi-Dimensional Queries. SIGMOD Conference 1988: 28-36 BibTeX
Wee Keong Ng, Chinya V. Ravishankar: Information Synthesis in Statistical Databases. CIKM 1995: 355-361 BibTeX
William H. Press, Saul A. Teukolsky, William T. Vetterling, Brian P. Flannery: Numerical Recipes in C, 2nd Edition. Cambridge University Press 1992
Contents BibTeX
Patricia G. Selinger, Morton M. Astrahan, Donald D. Chamberlin, Raymond A. Lorie, Thomas G. Price: Access Path Selection in a Relational Database Management System. SIGMOD Conference 1979: 23-34 BibTeX
Chung-Dak Shum, Richard R. Muntz: An Information-Theoretic Study on Aggregate Responses. VLDB 1988: 479-490 BibTeX
Yannis Theodoridis, Timos K. Sellis: A Model for the Prediction of R-tree Performance. PODS 1996: 161-171 BibTeX
Andreas S. Weigend, Neil A. Gerschenfeld: Time Series Prediction: Forecasting the Future and Understanding the Past. Addison-Wesley 1994, ISBN 0-201-62601-2
Jennifer Widom: Research Problems in Data Warehousing. CIKM 1995: 25-30 BibTeX

Referenced by

  1. Francesco Buccafurri, Filippo Furfaro, Domenico Saccà: Estimating Range Queries Using Aggregate Data with Integrity Constraints: A Probabilistic Approach. ICDT 2001: 390-404
  2. Themistoklis Palpanas: Knowledge Discovery in Data Warehouses. SIGMOD Record 29(3): 88-100(2000)
  3. Rakesh Agrawal, Ramakrishnan Srikant: Privacy-Preserving Data Mining. SIGMOD Conference 2000: 439-450
  4. Mehmet M. Dalkilic, Edward L. Robertson: Information Dependencies. PODS 2000: 245-253
  5. Jeffrey Scott Vitter, Min Wang: Approximate Computation of Multidimensional Aggregates of Sparse Data Using Wavelets. SIGMOD Conference 1999: 193-204
  6. Carlos A. Hurtado, Alberto O. Mendelzon, Alejandro A. Vaisman: Maintaining Data Cubes under Dimension Updates. ICDE 1999: 346-355
  7. Zuotao Li, Xiaoyang Sean Wang, Menas Kafatos, Ruixin Yang: A Pyramid Data Model for Supporting Content-Based Browsing and Knowledge Discovery. SSDBM 1998: 170-179
  8. Phillip B. Gibbons, Yossi Matias: New Sampling-Based Summary Statistics for Improving Approximate Query Answers. SIGMOD Conference 1998: 331-342
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
VLDB Proceedings: Copyright © by VLDB Endowment,
ACM SIGMOD Anthology: Copyright © by ACM (, Corrections:
DBLP: Copyright © by Michael Ley (, last change: Sat May 16 23:46:15 2009