ACM SIGMOD Anthology ACM SIGMOD dblp.uni-trier.de

An Extensible Classifier for Semi-Structured Documents.

Markus Tresch, Allen Luniewski: An Extensible Classifier for Semi-Structured Documents. CIKM 1995: 226-233
@inproceedings{DBLP:conf/cikm/TreschL95,
  author    = {Markus Tresch and
               Allen Luniewski},
  title     = {An Extensible Classifier for Semi-Structured Documents},
  booktitle = {CIKM '95, Proceedings of the 1995 International Conference on
               Information and Knowledge Management, November 28 - December
               2, 1995, Baltimore, Maryland, USA},
  publisher = {ACM},
  year      = {1995},
  pages     = {226-233},
  ee        = {db/conf/cikm/TreschL95.html, http://doi.acm.org/10.1145/221270.221575},
  crossref  = {DBLP:conf/cikm/95},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

In this paper, we present a vector space classifier for determining the type of semi-structured documents. Our goal was to design a high-performance classifier in terms of accuracy (recall and precision), speed, and flexibility.

The ability to dynamically extend a classifier with user-specific classes is crucial for many applications. Unfortunately, the training data of existing classes is often not available, such that the extended classifier is imprecise as a result.

We focus on this issue. First, we evaluate how to create class abstracts that can be used as training data replacement. Second, we introduce relevance feedback learning strategies to overcoming the remaining classifier flaw.

Copyright © 1995 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 2 Issue 4, CIKM, DOLAP, GIS, SIGFIDET, ..." and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

CIKM '95, Proceedings of the 1995 International Conference on Information and Knowledge Management, November 28 - December 2, 1995, Baltimore, Maryland, USA. ACM 1995
Contents BibTeX

Online Edition

Citation Page BibTeX

References

[BFOS84]
Leo Breiman, J. H. Friedman, R. A. Olshen, C. J. Stone: Classification and Regression Trees. Wadsworth 1984, ISBN 0-534-98053-8
BibTeX
[GRW84]
...
[Har92]
Donna Harman: Relevance Feedback Revisited. SIGIR 1992: 1-10 BibTeX
[Hoc94]
Rainer Hoch: Using IR Techniques for Text Classification in Document Analysis. SIGIR 1994: 31-40 BibTeX
[Hon94]
...
[Ide71]
...
[Jam85]
Mike James: Classification Algorithms. John Wiley 1985, ISBN 0-471-84799-2
BibTeX
[Jon71]
...
[MBK91]
Yoëlle S. Maarek, Daniel M. Berry, Gail E. Kaiser: An Information Retrieval Approach For Automatically Constructing Software Libraries. IEEE Trans. Software Eng. 17(8): 800-813(1991) BibTeX
[ODL93]
Katia Obraczka, Peter B. Danzig, Shih-Hao Li: Internet Resource Discovery Services. IEEE Computer 26(9): 8-22(1993) BibTeX
[Qui93]
J. Ross Quinlan: C4.5: Programs for Machine Learning. Morgan Kaufmann 1993, ISBN 1-55860-238-0
BibTeX
[Roc71]
...
[SB90]
...
[SLS+93]
Kurt A. Shoens, Allen Luniewski, Peter M. Schwarz, James W. Stamos, Joachim Thomas II: The Rufus System: Information Organization for Semi-Structured Data. VLDB 1993: 97-107 BibTeX
[SWY75]
Gerard Salton, A. Wong, C. S. Yang: A Vector Space Model for Automatic Indexing. Commun. ACM 18(11): 613-620(1975) BibTeX
[TPL94]
Markus Tresch, Neal Palmer, Allen Luniewski: Type Classification of Semi-Structured Documents. VLDB 1995: 263-274 BibTeX
[vR79]
C. J. van Rijsbergen: Information Retrieval. Butterworth 1979, ISBN 0-408-70929-4
BibTeX
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
CIKM 1995 Proceedings, ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:01:49 2009