Query Decomposition and View Maintenance for Query Languages for Unstructured Data.

Dan Suciu: Query Decomposition and View Maintenance for Query Languages for Unstructured Data. VLDB 1996: 227-238
  author    = {Dan Suciu},
  editor    = {T. M. Vijayaraman and
               Alejandro P. Buchmann and
               C. Mohan and
               Nandlal L. Sarda},
  title     = {Query Decomposition and View Maintenance for Query Languages
               for Unstructured Data},
  booktitle = {VLDB'96, Proceedings of 22th International Conference on Very
               Large Data Bases, September 3-6, 1996, Mumbai (Bombay), India},
  publisher = {Morgan Kaufmann},
  year      = {1996},
  isbn      = {1-55860-382-4},
  pages     = {227-238},
  ee        = {db/conf/vldb/Suciu96.html},
  crossref  = {DBLP:conf/vldb/96},
  bibsource = {DBLP,}


Recently, several query languages have been proposed for querying information sources whose data is not constrained by a schema, or whose schema is unknown. Examples include: LOREL (for querying data combined from several heterogeneous sources), W3QS (for querying the World Wide Web); and UNQL (for querying unstructured data).

The natural data model for such languages is that of a rooted, labeled graph. Their main novelty is the ability to express queries which traverse arbitrarily long paths in the graph, typically described by a regular expression. Such queries however may prove difficult to evaluate in the case when the data is distributed on several sites, with many edges going between sites. A typical case is that of a collection of WWW sites, with links pointing freely from one site to another (even forming cycles). A naive query shipping strategy may force the query to migrate back and forth between the various sites, leading to poor performance (or even non-termination). We present a technique for query decomposition, under which the query is shipped exactly once to every site, computed locally, then the local results are shipped to the client, and assembled here into the final result. This technique is efficient, in that (a) only data which is part of the final result is shipped from the data sites to the client site, and (b) the total work done locally at all sites does not exceed that needed for computing the (unoptimized) query on a centralized version of the same database.

We also show that the query decomposition technique can be adapted to derive a simple view maintenance method, for two forms of updates which we introduce for the graph data model.

Copyright © 1996 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.

Online Paper

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

T. M. Vijayaraman, Alejandro P. Buchmann, C. Mohan, Nandlal L. Sarda (Eds.): VLDB'96, Proceedings of 22th International Conference on Very Large Data Bases, September 3-6, 1996, Mumbai (Bombay), India. Morgan Kaufmann 1996, ISBN 1-55860-382-4
Contents BibTeX

Electronic Edition


Thomas Ball, Fred Douglis: An Internet Difference Engine and its Applications. COMPCON 1996: 71-76 BibTeX
Peter Buneman, Susan B. Davidson, Gerd G. Hillebrand, Dan Suciu: A Query Language and Optimization Techniques for Unstructured Data. SIGMOD Conference 1996: 505-516 BibTeX
Peter Buneman, Susan B. Davidson, Dan Suciu: Programming Constructs for Unstructured Data. DBPL 1995: 12 BibTeX
Catriel Beeri, Yoram Kornatzky: A Logical Query Language for Hypertext Systems. ECHT 1990: 67-80 BibTeX
Bruno Courcelle: Fundamental Properties of Infinite Trees. Theor. Comput. Sci. 25: 95-169(1983) BibTeX
Timothy Griffin, Leonid Libkin: Incremental Maintenance of Views with Duplicates. SIGMOD Conference 1995: 328-339 BibTeX
David Konopnicki, Oded Shmueli: W3QS: A Query System for the World-Wide Web. VLDB 1995: 54-65 BibTeX
Alon Y. Levy, Alberto O. Mendelzon, Yehoshua Sagiv, Divesh Srivastava: Answering Queries Using Views. PODS 1995: 95-104 BibTeX
Alon Y. Levy, Anand Rajaraman, Jeffrey D. Ullman: Answering Queries Using Limited External Processors. PODS 1996: 227-237 BibTeX
Yannis Papakonstantinou, Hector Garcia-Molina, Jennifer Widom: Object Exchange Across Heterogeneous Information Sources. ICDE 1995: 251-260 BibTeX
Dallan Quass, Anand Rajaraman, Yehoshua Sagiv, Jeffrey D. Ullman, Jennifer Widom: Querying Semistructured Heterogeneous Information. DOOD 1995: 319-344 BibTeX

Referenced by

  1. Serge Abiteboul, Jason McHugh, Michael Rys, Vasilis Vassalos, Janet L. Wiener: Incremental Maintenance for Materialized Views over Semistructured Data. VLDB 1998: 38-49
  2. Svetlozar Nestorov, Serge Abiteboul, Rajeev Motwani: Extracting Schema from Semistructured Data. SIGMOD Conference 1998: 295-306
  3. William W. Cohen: Providing Database-like Access to the Web Using Queries Based on Textual Similarity. SIGMOD Conference 1998: 558-560
  4. William W. Cohen: Integration of Heterogeneous Databases Without Common Domains Using Queries Based on Textual Similarity. SIGMOD Conference 1998: 201-212
  5. Yue Zhuge, Hector Garcia-Molina: Graph Structured Views and Their Incremental Maintenance. ICDE 1998: 116-125
  6. Richard Hull: Managing Semantic Heterogeneity in Databases: A Theoretical Perspective. PODS 1997: 51-61
  7. Peter Buneman: Semistructured Data. PODS 1997: 117-121
  8. Serge Abiteboul, Victor Vianu: Regular Path Queries with Constraints. PODS 1997: 122-133
  9. Peter Buneman, Susan B. Davidson, Mary F. Fernandez, Dan Suciu: Adding Structure to Unstructured Data. ICDT 1997: 336-350
  10. Serge Abiteboul: Querying Semi-Structured Data. ICDT 1997: 1-18
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
VLDB Proceedings: Copyright © by VLDB Endowment,
ACM SIGMOD Anthology: Copyright © by ACM (, Corrections:
DBLP: Copyright © by Michael Ley (, last change: Sat May 16 23:46:11 2009