Efficient Snapshot Differential Algorithms for Data Warehousing.

Wilburt Labio, Hector Garcia-Molina: Efficient Snapshot Differential Algorithms for Data Warehousing. VLDB 1996: 63-74
  author    = {Wilburt Labio and
               Hector Garcia-Molina},
  editor    = {T. M. Vijayaraman and
               Alejandro P. Buchmann and
               C. Mohan and
               Nandlal L. Sarda},
  title     = {Efficient Snapshot Differential Algorithms for Data Warehousing},
  booktitle = {VLDB'96, Proceedings of 22th International Conference on Very
               Large Data Bases, September 3-6, 1996, Mumbai (Bombay), India},
  publisher = {Morgan Kaufmann},
  year      = {1996},
  isbn      = {1-55860-382-4},
  pages     = {63-74},
  ee        = {db/conf/vldb/LabioG96.html},
  crossref  = {DBLP:conf/vldb/96},
  bibsource = {DBLP,}


Detecting and extracting modifications from information sources is an integral part of data warehousing. For unsophisticated sources, in practice it is often necessary to infer modifications by periodically comparing snapshots of data from the source. Although this snapshot differential problem is closely related to traditional joins and outerjoins, there are significant differences, which lead to simple new algorithms. In particular, we present algorithms that perform (possibly lossy) compression of records. We also present a window algorithm that works very well if the snapshots are not ``very different.'' The algorithms are studied via analysis and an implementation of two of them; the results illustrate the potential gains achievable with the new algorithms.

Copyright © 1996 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.

Online Paper

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

T. M. Vijayaraman, Alejandro P. Buchmann, C. Mohan, Nandlal L. Sarda (Eds.): VLDB'96, Proceedings of 22th International Conference on Very Large Data Bases, September 3-6, 1996, Mumbai (Bombay), India. Morgan Kaufmann 1996, ISBN 1-55860-382-4
Contents BibTeX

Electronic Edition


Michel E. Adiba, Bruce G. Lindsay: Database Snapshots. VLDB 1980: 86-91 BibTeX
Sergey Brin, James Davis, Hector Garcia-Molina: Copy Detection Mechanisms for Digital Documents. SIGMOD Conference 1995: 398-409 BibTeX
Daniel Barbará, Hector Garcia-Molina, Bernardo Feijoo: Exploiting Symmetries for Low-Cost Comparison of File Copies. ICDCS 1988: 471-479 BibTeX
Sudarshan S. Chawathe, Anand Rajaraman, Hector Garcia-Molina, Jennifer Widom: Change Detection in Hierarchically Structured Information. SIGMOD Conference 1996: 493-504 BibTeX
Laura M. Haas, Michael J. Carey, Miron Livny, Amit Shukla: Seeking the Truth About ad hoc Join Costs. VLDB J. 6(3): 241-256(1997) BibTeX
Joachim Hammer, Hector Garcia-Molina, Jennifer Widom, Wilburt Labio, Yue Zhuge: The Stanford Data Warehousing Project. IEEE Data Eng. Bull. 18(2): 41-48(1995) BibTeX
James W. Hunt, Thomas G. Szymanski: A Fast Algorithm for Computing Longest Subsequences. Commun. ACM 20(5): 350-353(1977) BibTeX
Bo Kähler, Oddvar Risnes: Extending Logging for Database Snapshot Refresh. VLDB 1987: 389-398 BibTeX
Bruce G. Lindsay, Laura M. Haas, C. Mohan, Hamid Pirahesh, Paul F. Wilms: A Snapshot Differential Refresh Algorithm. SIGMOD Conference 1986: 53-60 BibTeX
Guy M. Lohman, C. Mohan, Laura M. Haas, Dean Daniels, Bruce G. Lindsay, Patricia G. Selinger, Paul F. Wilms: Query Processing in R*. Query Processing in Database Systems 1985: 31-47 BibTeX
Priti Mishra, Margaret H. Eich: Join Processing in Relational Databases. ACM Comput. Surv. 24(1): 63-113(1992) BibTeX
Udi Manber, Sun Wu: GLIMPSE: A Tool to Search Through Entire File Systems. USENIX Winter 1994: 23-32 BibTeX
Narayanan Shivakumar, Hector Garcia-Molina: SCAM: A Copy Detection Mechanism for Digital Documents. DL 1995: 0- BibTeX
Leonard D. Shapiro: Join Processing in Database Systems with Large Main Memories. ACM Trans. Database Syst. 11(3): 239-264(1986) BibTeX
Cass Squire: Data Extraction and Transformation for the Data Warehouse. SIGMOD Conference 1995: 446-447 BibTeX
Jeffrey D. Ullman: Principles of Database and Knowledge-Base Systems, Volume II. Computer Science Press 1989, ISBN 0-7167-8162-X
Contents BibTeX
Yue Zhuge, Hector Garcia-Molina, Joachim Hammer, Jennifer Widom: View Maintenance in a Warehousing Environment. SIGMOD Conference 1995: 316-327 BibTeX

Referenced by

  1. Sudarshan S. Chawathe: Comparing Hierarchical Data in External Memory. VLDB 1999: 90-101
  2. Wilburt Labio, Yue Zhuge, Janet L. Wiener, Himanshu Gupta, Hector Garcia-Molina, Jennifer Widom: The WHIPS Prototype for Data Warehouse Creation and Maintenance. SIGMOD Conference 1997: 557-559
  3. Sudarshan S. Chawathe, Hector Garcia-Molina: Meaningful Change Detection in Structured Data. SIGMOD Conference 1997: 26-37
  4. Janet L. Wiener, Himanshu Gupta, Wilburt Labio, Yue Zhuge, Hector Garcia-Molina: The WHIPS Prototype for Data Warehouse Creation and Maintenance. ICDE 1997: 589
  5. Jennifer Widom: Research Problems in Data Warehousing. CIKM 1995: 25-30
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
VLDB Proceedings: Copyright © by VLDB Endowment,
ACM SIGMOD Anthology: Copyright © by ACM (, Corrections:
DBLP: Copyright © by Michael Ley (, last change: Sat May 16 23:46:09 2009