A Compression Technique for Large Statistical Data-Bases.

Susan J. Eggers, Frank Olken, Arie Shoshani: A Compression Technique for Large Statistical Data-Bases. VLDB 1981: 424-434
  author    = {Susan J. Eggers and
               Frank Olken and
               Arie Shoshani},
  title     = {A Compression Technique for Large Statistical Data-Bases},
  booktitle = {Very Large Data Bases, 7th International Conference, September
               9-11, 1981, Cannes, France, Proceedings},
  publisher = {IEEE Computer Society},
  year      = {1981},
  pages     = {424-434},
  ee        = {db/conf/vldb/EggersOS81.html},
  crossref  = {DBLP:conf/vldb/81},
  bibsource = {DBLP,}


In this paper we explore the compression of large statistical databases and propose techniques for organizing the compressed data, such that the time required to access the data is logarithmic. Our techniques are variations of run-length encoding, in which modified run-lengths for the series are extracted from the data stream and stored in a header, which is used to form the base level of a B-tree index into the database. The run-lengths are cumulative, and therefore the access time of the data is logarithmic in the size of the header. We discuss the details of the compression scheme and its implementation, present several special cases and give an analysis of the relative performance of the various versions.

Copyright © 1981 by The Institute of Electrical and Electronic Engineers, Inc. (IEEE). Abstract used with permission.

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 4, VLDB '75-'88" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

Very Large Data Bases, 7th International Conference, September 9-11, 1981, Cannes, France, Proceedings. IEEE Computer Society 1981
Contents BibTeX


Don S. Batory: On Searching Transposed Files. ACM Trans. Database Syst. 4(4): 531-544(1979) BibTeX
Susan J. Eggers, Arie Shoshani: Efficient Access of Compressed Data. VLDB 1980: 205-211 BibTeX
Bruce Hahn: A New Technique for Compression and Storage of Data. Commun. ACM 17(8): 434-436(1974) BibTeX
Michael Hammer, Bahram Niamir: A Heuristic Approach to Attribute Partitioning. SIGMOD Conference 1979: 93-101 BibTeX
Donald E. Knuth: The Art of Computer Programming, Volume III: Sorting and Searching. Addison-Wesley 1973, ISBN 0-201-03803-X
Per Svensson: On Search Performance for Conjunctive Queries in Compressed, Fully Transposed Ordered Files. VLDB 1979: 155-163 BibTeX
Robert Endre Tarjan, Andrew Chi-Chih Yao: Storing a Sparse Table. Commun. ACM 22(11): 606-611(1979) BibTeX
Jacob Ziv, Abraham Lempel: A Universal Algorithm for Sequential Data Compression. IEEE Transactions on Information Theory 23(3): 337-343(1977) BibTeX
Jacob Ziv, Abraham Lempel: Compression of Individual Sequences via Variable-Rate Coding. IEEE Transactions on Information Theory 24(5): 530-536(1978) BibTeX

Referenced by

  1. Jonathan Goldstein, Raghu Ramakrishnan, Uri Shaft: Compressing Relations and Indexes. ICDE 1998: 370-379
  2. Wee Keong Ng, Chinya V. Ravishankar: Block-Oriented Compression Techniques for Large Statistical Databases. IEEE Trans. Knowl. Data Eng. 9(2): 314-328(1997)
  3. Arie Shoshani: OLAP and Statistical Databases: Similarities and Differences. PODS 1997: 185-196
  4. Wee Keong Ng, Chinya V. Ravishankar: A Physical Storage for Efficient Statistical Query Processing. SSDBM 1994: 97-106
  5. Mostafa A. Bassiouni, Amar Mukherjee, N. Ranganathan: On Software and Hardware Techniques of Data Engineering. ICDE 1989: 208-215
  6. Mostafa A. Bassiouni, N. Ranganathan, Amar Mukherjee: Software and Hardware Enhancement of Arithmetic Coding. SSDBM 1988: 120-132
  7. Don S. Batory: Concepts for a Database System Compiler. PODS 1988: 184-192
  8. Jianzhong Li, Doron Rotem, Harry K. T. Wong: A New Compression Method with Fast Searching on Large Databases. VLDB 1987: 311-318
  9. Jianzhong Li, Harry K. T. Wong: Batched Interpolation Searching on Databases. ICDE 1987: 18-24
  10. Frank Olken, Doron Rotem: Rearranging Data to Maximize the Efficiency of Compression. PODS 1986: 78-90
  11. Harry K. T. Wong, Hsiu-Fen Liu, Frank Olken, Doron Rotem, Linda Wong: Bit Transposed Files. VLDB 1985: 448-457
  12. George P. Copeland, Setrag Khoshafian: A Decomposition Storage Model. SIGMOD Conference 1985: 268-279
  13. Chaitanya K. Baru, Stanley Y. W. Su: Performance Evaluation of the Statistical Aggregation by Caterogization in the SM3 System. SIGMOD Conference 1984: 77-89
  14. Harry K. T. Wong: Micro and Macro Statistical/Scientific Database Management. ICDE 1984: 104-106
  15. Stanley Y. W. Su, Shamkant B. Navathe, Don S. Batory: Logical and Physical Modeling of Statistical Scientific Databases. SSDBM 1983: 251-263
  16. Sandra Heiler, Rita F. Bergman: SIBYL: An Economist's Workbench. SSDBM 1983: 73-79
  17. Fredric C. Gey, John McCarthy, Deane Merrill, Harvard Holmes: Computer-Independent Data Compression for Large Statistical Databases. SSDBM 1983: 296-305
  18. Paul Chan, Susan J. Eggers, Fredric C. Gey, Harvard Holmes, Peter Kreps, John McCarthy, Deane Merrill, Frank Olken, Arie Shoshani, Harry K. T. Wong: Statistical Data Management Research at Lawrence Berkeley Laboratory. SSDBM 1983: 273-279
  19. Don S. Batory: Index Coding: A Compression Technique for Large Statistical Databases. SSDBM 1983: 306-314
  20. Arie Shoshani: Statistical Databases: Characteristics, Problems, and some Solutions. VLDB 1982: 208-222
  21. Paula B. Hawthorn: Microprocessor Assisted Tuple Access, Decompression and Assembly for Statistical Database Systems. VLDB 1982: 223-233
  22. Douglas M. Bates, Haran Boral, David J. DeWitt: A Framework for Research in Database Management for Statistical Analysis. SIGMOD Conference 1982: 69-78
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
VLDB Proceedings (1977-1981): Copyright © by IEEE,
ACM SIGMOD Anthology: Copyright © by ACM (, Corrections:
DBLP: Copyright © by Michael Ley (, last change: Sat May 16 23:45:13 2009