ACM SIGMOD Anthology ACM SIGMOD dblp.uni-trier.de

Extracting Schema from Semistructured Data.

Svetlozar Nestorov, Serge Abiteboul, Rajeev Motwani: Extracting Schema from Semistructured Data. SIGMOD Conference 1998: 295-306
@inproceedings{DBLP:conf/sigmod/NestorovAM98,
  author    = {Svetlozar Nestorov and
               Serge Abiteboul and
               Rajeev Motwani},
  editor    = {Laura M. Haas and
               Ashutosh Tiwary},
  title     = {Extracting Schema from Semistructured Data},
  booktitle = {SIGMOD 1998, Proceedings ACM SIGMOD International Conference
               on Management of Data, June 2-4, 1998, Seattle, Washington, USA},
  publisher = {ACM Press},
  year      = {1998},
  isbn      = {0-89791-995-5},
  pages     = {295-306},
  ee        = {http://doi.acm.org/10.1145/276304.276331, db/conf/sigmod/NestorovAM98.html},
  crossref  = {DBLP:conf/sigmod/98},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

Semistructured data is characterized by the lack of any fixed and rigid schema, although typically the data has some implicit structure. While the lack of fixed schema makes extracting semistructured data fairly easy and an attractive goal, presenting and querying such data is greatly impaired. Thus, a critical problem is the discovery of the structure implicit in semistructured data and, subsequently, the recasting of the raw data in terms of this structure. In this paper, we consider a very general form of semistructured data based on labeled, directed graphs. We show that such data can be typed using the greatest fixpoint semantics of monadic datalog programs. We present an algorithm for approximate typing of semistructured data. We establish that the general problem of finding an optimal such typing is NP-hard, but present some heuristics and techniques based on clustering that allow efficient and near-optimal treatment of the problem. We also present some preliminary experimental results.

Copyright © 1998 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


ACM SIGMOD DiSC

CDROM Version: Load the CDROM "DiSC, Volume 1 Number 1" and ... Online Version (ACM WWW Account required): Full Text in PDF Format

ACM SIGMOD Anthology

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

Laura M. Haas, Ashutosh Tiwary (Eds.): SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, June 2-4, 1998, Seattle, Washington, USA. ACM Press 1998, ISBN 0-89791-995-5 BibTeX , SIGMOD Record 27(2), June 1998
Contents

Online Edition: ACM SIGMOD

[Abstract]
[Full Text (Postscript)]

Long Version

http://www-db.stanford.edu/pub/papers/extract-schema.ps

References

[1]
Serge Abiteboul: Querying Semi-Structured Data. ICDT 1997: 1-18 BibTeX
[2]
Serge Abiteboul, Richard Hull, Victor Vianu: Foundations of Databases. Addison-Wesley 1995, ISBN 0-201-53771-0
Contents BibTeX
[3]
Antonio Albano, Roberto Bergamini, Giorgio Ghelli, Renzo Orsini: An Object Data Model with Roles. VLDB 1993: 39-51 BibTeX
[4]
...
[5]
...
[6]
Peter Buneman: Semistructured Data. PODS 1997: 117-121 BibTeX
[7]
Peter Buneman, Susan B. Davidson, Mary F. Fernandez, Dan Suciu: Adding Structure to Unstructured Data. ICDT 1997: 336-350 BibTeX
[8]
Peter Buneman, Susan B. Davidson, Gerd G. Hillebrand, Dan Suciu: A Query Language and Optimization Techniques for Unstructured Data. SIGMOD Conference 1996: 505-516 BibTeX
[9]
R. G. G. Cattell: The Object Database Standard: ODMG-93 (Release 1.1). Morgan Kaufmann 1994
BibTeX
[10]
Roy Goldman, Jennifer Widom: DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases. VLDB 1997: 436-445 BibTeX
[11]
...
[12]
...
[13]
...
[14]
...
[15]
Svetlozar Nestorov, Jeffrey D. Ullman, Janet L. Wiener, Sudarshan S. Chawathe: Representative Objects: Concise Representations of Semistructured, Hierarchial Data. ICDE 1997: 79-90 BibTeX
[16]
Dallan Quass, Anand Rajaraman, Yehoshua Sagiv, Jeffrey D. Ullman, Jennifer Widom: Querying Semistructured Heterogeneous Information. DOOD 1995: 319-344 BibTeX
[17]
Dan Suciu: Query Decomposition and View Maintenance for Query Languages for Unstructured Data. VLDB 1996: 227-238 BibTeX
[18]
Jeffrey D. Ullman: Principles of Database and Knowledge-Base Systems, Volume I. Computer Science Press 1988, ISBN 0-7167-8158-1
Contents BibTeX
[18-2]
Jeffrey D. Ullman: Principles of Database and Knowledge-Base Systems, Volume II. Computer Science Press 1989, ISBN 0-7167-8162-X
Contents BibTeX
[19]
Moshé M. Zloof: Query-by-Example: A Data Base Language. IBM Systems Journal 16(4): 324-343(1977) BibTeX

Referenced by

  1. Minos N. Garofalakis, Aristides Gionis, Rajeev Rastogi, S. Seshadri, Kyuseok Shim: XTRACT: A System for Extracting Document Type Descriptors from XML Documents. SIGMOD Conference 2000: 165-176
  2. Yannis Papakonstantinou, Victor Vianu: DTD Inference for Views of XML Data. PODS 2000: 35-46
  3. Qiu Yue Wang, Jeffrey Xu Yu, Kam-Fai Wong: Approximate Graph Schema Extraction for Semi-Structured Data. EDBT 2000: 302-316
  4. Sihem Amer-Yahia, H. V. Jagadish, Laks V. S. Lakshmanan, Divesh Srivastava: On Bounding-Schemas for LDAP Directories. EDBT 2000: 287-301
  5. Minos N. Garofalakis, Rajeev Rastogi, S. Seshadri, Kyuseok Shim: Data Mining and the Web: Past, Present and Future. Workshop on Web Information and Data Management 1999: 43-47
  6. Alin Deutsch, Mary F. Fernández, Dan Suciu: Storing Semistructured Data with STORED. SIGMOD Conference 1999: 431-442
  7. Yaron Kanza, Werner Nutt, Yehoshua Sagiv: Queries with Incomplete Answers over Semistructured Data. PODS 1999: 227-236
  8. Stéphane Grumbach, Giansalvatore Mecca: In Search of the Lost Schema. ICDT 1999: 314-331
  9. Silvana Castano, Valeria De Antonellis: Building Views over Semistructured Data Sources. ER 1999: 146-160
  10. Daniela Florescu, Alon Y. Levy, Alberto O. Mendelzon: Database Techniques for the World-Wide Web: A Survey. SIGMOD Record 27(3): 59-74(1998)
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Wed Nov 19 18:54:10 2008