DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases.

Roy Goldman, Jennifer Widom: DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases. VLDB 1997: 436-445
  author    = {Roy Goldman and
               Jennifer Widom},
  editor    = {Matthias Jarke and
               Michael J. Carey and
               Klaus R. Dittrich and
               Frederick H. Lochovsky and
               Pericles Loucopoulos and
               Manfred A. Jeusfeld},
  title     = {DataGuides: Enabling Query Formulation and Optimization in Semistructured
  booktitle = {VLDB'97, Proceedings of 23rd International Conference on Very
               Large Data Bases, August 25-29, 1997, Athens, Greece},
  publisher = {Morgan Kaufmann},
  year      = {1997},
  isbn      = {1-55860-470-7},
  pages     = {436-445},
  ee        = {db/conf/vldb/GoldmanW97.html},
  crossref  = {DBLP:conf/vldb/97},
  bibsource = {DBLP,}


In semistructured databases there is no schema fixed in advance. To provide the benefits of a schema in such environments, we introduce DataGuides: concise and accurate structural summaries of semistructured databases. DataGuides serve as dynamic schemas, generated from the database; they are useful for browsing database structure, formulating queries, storing information such as statistics and sample values, and enabling query optimization. This paper presents the theoretical foundations of DataGuides along with an algorithm for their creation and an overview of incremental maintenance. We provide performance results based on our implementation of DataGuides in the Lore DBMS for semistructured data. We also describe the use of DataGuides in Lore, both in the user interface to enable structure browsing and query formulation, and as a means of guiding the query processor and optimizing query execution.

Copyright © 1997 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.

Online Paper

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

Matthias Jarke, Michael J. Carey, Klaus R. Dittrich, Frederick H. Lochovsky, Pericles Loucopoulos, Manfred A. Jeusfeld (Eds.): VLDB'97, Proceedings of 23rd International Conference on Very Large Data Bases, August 25-29, 1997, Athens, Greece. Morgan Kaufmann 1997, ISBN 1-55860-470-7
Contents BibTeX

Electronic Edition

From CS Dept., University Trier (Germany)


Rakesh Agrawal, Narain H. Gehani, J. Srinivasan: OdeView: The Graphical Interface to Ode. SIGMOD Conference 1990: 34-43 BibTeX
Serge Abiteboul, Dallan Quass, Jason McHugh, Jennifer Widom, Janet L. Wiener: The Lorel Query Language for Semistructured Data. Int. J. on Digital Libraries 1(1): 68-88(1997) BibTeX
Peter Buneman, Susan B. Davidson, Mary F. Fernandez, Dan Suciu: Adding Structure to Unstructured Data. ICDT 1997: 336-350 BibTeX
Peter Buneman, Susan B. Davidson, Gerd G. Hillebrand, Dan Suciu: A Query Language and Optimization Techniques for Unstructured Data. SIGMOD Conference 1996: 505-516 BibTeX
Peter Buneman, Susan B. Davidson, Dan Suciu: Programming Constructs for Unstructured Data. DBPL 1995: 12 BibTeX
Elisa Bertino, Won Kim: Indexing Techniques for Queries on Nested Objects. IEEE Trans. Knowl. Data Eng. 1(2): 196-214(1989) BibTeX
R. G. G. Cattell: The Object Database Standard: ODMG-93. Morgan Kaufmann 1993, ISBN 1-55860-302-6
Sudarshan S. Chawathe, Ming-Syan Chen, Philip S. Yu: On Index Selection Schemes for Nested Object Hierarchies. VLDB 1994: 331-341 BibTeX
Michael J. Carey, Laura M. Haas, Vivekananda Maganty, John H. Williams: PESTO : An Integrated Query/Browser for Object Databases. VLDB 1996: 203-214 BibTeX
John E. Hopcroft, Jeffrey D. Ullman: Introduction to Automata Theory, Languages and Computation. Addison-Wesley 1979, ISBN 0-201-02988-X
Alfons Kemper, Guido Moerkotte: Access Support Relations: An Indexing Method for Object Bases. Inf. Syst. 17(2): 117-145(1992) BibTeX
David Konopnicki, Oded Shmueli: W3QS: A Query System for the World-Wide Web. VLDB 1995: 54-65 BibTeX
Jason McHugh, Serge Abiteboul, Roy Goldman, Dallan Quass, Jennifer Widom: Lore: A Database Management System for Semistructured Data. SIGMOD Record 26(3): 54-66(1997) BibTeX
Amihai Motro, Alessandro D'Atri, Laura Tarantino: The Design of KIVIEW: An Object-Oriented Browser. Expert Database Conf. 1988: 107-131 BibTeX
Svetlozar Nestorov, Jeffrey D. Ullman, Janet L. Wiener, Sudarshan S. Chawathe: Representative Objects: Concise Representations of Semistructured, Hierarchial Data. ICDE 1997: 79-90 BibTeX
Yannis Papakonstantinou, Hector Garcia-Molina, Jennifer Widom: Object Exchange Across Heterogeneous Information Sources. ICDE 1995: 251-260 BibTeX
Michael Stonebraker, Joseph Kalash: TIMBER: A Sophisticated Relation Browser (Invited Paper). VLDB 1982: 1-10 BibTeX
Moshé M. Zloof: Query-by-Example: A Data Base Language. IBM Systems Journal 16(4): 324-343(1977) BibTeX

Referenced by

  1. Holger Meuss, Klaus U. Schulz, François Bry: Towards Aggregated Answers for Semistructured Data. ICDT 2001: 346-360
  2. Gabriel M. Kuper, Jérôme Siméon: Subsumption for XML types. ICDT 2001: 331-345
  3. Hartmut Liefke, Dan Suciu: XMILL: An Efficient Compressor for XML Data. SIGMOD Conference 2000: 153-164
  4. Minos N. Garofalakis, Aristides Gionis, Rajeev Rastogi, S. Seshadri, Kyuseok Shim: XTRACT: A System for Extracting Document Type Descriptors from XML Documents. SIGMOD Conference 2000: 165-176
  5. Yannis Papakonstantinou, Victor Vianu: DTD Inference for Views of XML Data. PODS 2000: 35-46
  6. Qiu Yue Wang, Jeffrey Xu Yu, Kam-Fai Wong: Approximate Graph Schema Extraction for Semi-Structured Data. EDBT 2000: 302-316
  7. Birgitta König-Ries: An Approach to the Semi-Automatic Generation of Mediator Specifications. EDBT 2000: 101-117
  8. Stefano Ceri, Piero Fraternali, Stefano Paraboschi: XML: Current Developments and Future Challenges for the Database Community. EDBT 2000: 3-17
  9. Sihem Amer-Yahia, H. V. Jagadish, Laks V. S. Lakshmanan, Divesh Srivastava: On Bounding-Schemas for LDAP Directories. EDBT 2000: 287-301
  10. Alin Deutsch, Mary F. Fernández, Daniela Florescu, Alon Y. Levy, David Maier, Dan Suciu: Querying XML Data. IEEE Data Eng. Bull. 22(3): 10-18(1999)
  11. Jason McHugh, Jennifer Widom: Query Optimization for XML. VLDB 1999: 315-326
  12. Curtis E. Dyreson, Michael H. Böhlen, Christian S. Jensen: Capturing and Querying Multiple Aspects of Semistructured Data. VLDB 1999: 290-301
  13. Weidong Chen, Jyh-Herng Chow, You-Chin Fuh, Jean Grandbois, Michelle Jou, Nelson Mendonça Mattos, Brian T. Tran, Yun Wang: High Level Indexing of User-Defined Types. VLDB 1999: 554-564
  14. Luc Bouganim, Tatiana Chan-Sine-Ying, Tuyet-Tram Dang-Ngoc, Jean-Luc Darroux, Georges Gardarin, Fei Sha: Miro Web: Integrating Multiple Data Sources through Semistructured Data Types. VLDB 1999: 750-753
  15. Yannis Papakonstantinou, Vasilis Vassalos: Query Rewriting for Semistructured Data. SIGMOD Conference 1999: 455-466
  16. H. V. Jagadish, Laks V. S. Lakshmanan, Tova Milo, Divesh Srivastava, Dimitra Vista: Querying Network Directories. SIGMOD Conference 1999: 133-144
  17. Tova Milo, Dan Suciu: Type Inference for Queries on Semistructured Data. PODS 1999: 215-226
  18. Yaron Kanza, Werner Nutt, Yehoshua Sagiv: Queries with Incomplete Answers over Semistructured Data. PODS 1999: 227-236
  19. Tova Milo, Dan Suciu: Index Structures for Path Expressions. ICDT 1999: 277-295
  20. Catriel Beeri, Tova Milo: Schemas for Integration and Translation of Structured and Semi-structured Data. ICDT 1999: 296-313
  21. Yannis Papakonstantinou, Pavel Velikhov: Enhancing Semistructured Data Mediators with Document Type Definitions. ICDE 1999: 136-145
  22. Sin Yeung Lee, Mong-Li Lee, Tok Wang Ling, Leonid A. Kalinichenko: Designing Good Semi-Structured Databases and Conceptual Modeling. ER 1999: 131-145
  23. Georges Gardarin, Fei Sha, Tuyet-Tram Dang-Ngoc: XML-based Components for Federating Multiple Heterogeneous Data Sources. ER 1999: 506-519
  24. Jyh-Herng Chow, Josephine M. Cheng, Daniel T. Chang, Jane Xu: Index Design for Structured Documents Based on Abstraction. DASFAA 1999: 89-96
  25. Daniela Florescu, Alon Y. Levy, Alberto O. Mendelzon: Database Techniques for the World-Wide Web: A Survey. SIGMOD Record 27(3): 59-74(1998)
  26. Serge Abiteboul, Jason McHugh, Michael Rys, Vasilis Vassalos, Janet L. Wiener: Incremental Maintenance for Materialized Views over Semistructured Data. VLDB 1998: 38-49
  27. Svetlozar Nestorov, Serge Abiteboul, Rajeev Motwani: Extracting Schema from Semistructured Data. SIGMOD Conference 1998: 295-306
  28. Zoé Lacroix, Arnaud Sahuguet, Raman Chandrasekar: User-oriented smart-cache for the Web: What You Seek is What You Get! SIGMOD Conference 1998: 572-574
  29. Sophie Cluet, Claude Delobel, Jérôme Siméon, Katarzyna Smaga: Your Mediators Need Data Conversion! SIGMOD Conference 1998: 177-188
  30. Peter Buneman: Semistructured Data. PODS 1997: 117-121
  31. Joachim Hammer, Jason McHugh, Hector Garcia-Molina: Semistructured Data: The Tsimmis Experience. ADBIS 1997: 1-8
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
VLDB Proceedings: Copyright © by VLDB Endowment,
ACM SIGMOD Anthology: Copyright © by ACM (, Corrections:
DBLP: Copyright © by Michael Ley (, last change: Sat May 16 23:46:17 2009