Random Sampling from B+ Trees.

Frank Olken, Doron Rotem: Random Sampling from B+ Trees. VLDB 1989: 269-277
  author    = {Frank Olken and
               Doron Rotem},
  editor    = {Peter M. G. Apers and
               Gio Wiederhold},
  title     = {Random Sampling from B+ Trees},
  booktitle = {Proceedings of the Fifteenth International Conference on Very
               Large Data Bases, August 22-25, 1989, Amsterdam, The Netherlands},
  publisher = {Morgan Kaufmann},
  year      = {1989},
  isbn      = {1-55860-101-5},
  pages     = {269-277},
  ee        = {db/conf/vldb/OlkenR89.html},
  crossref  = {DBLP:conf/vldb/89},
  bibsource = {DBLP,}


We consider the design and analysis of algorithms to retrieve simple random samples from databases. Specifically, we examine simple random sampling from B+ treefiles. Existing methods of sampling from B+ trees, require the use of auxiliary rank information in the nodes of the tree. Such modified B+ tree files are called "ranked B+trees". We compare sampling from ranked B+ tree files, with new acceptance/rejection (A/R) sampling methods which sample directly from standard B+ trees. Our new A/R sampling algorithm can easily be retrofit to existing DBMSs, and does not require the overhead of maintaining rank information. We consider both iterative and batch sampling methods.

Copyright © 1989 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.

Online Paper

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

Peter M. G. Apers, Gio Wiederhold (Eds.): Proceedings of the Fifteenth International Conference on Very Large Data Bases, August 22-25, 1989, Amsterdam, The Netherlands. Morgan Kaufmann 1989, ISBN 1-55860-101-5


William G. Cochran: Sampling Techniques, 3rd Edition. John Wiley 1977, ISBN 0-471-16240-X
Jarmo Ernvall, Olli Nevalainen: An Algorithm for Unbiased Random Sampling. Comput. J. 25(1): 45-47(1982) BibTeX
Sakti P. Ghosh: SIAM: statistics information access method. Inf. Syst. 13(4): 359-368(1988) BibTeX
Wen-Chi Hou, Gultekin Özsoyoglu, Baldeo K. Taneja: Statistical Estimators for Relational Algebra Expressions. PODS 1988: 276-287 BibTeX
Donald E. Knuth: The Art of Computer Programming, Volume III: Sorting and Searching. Addison-Wesley 1973, ISBN 0-201-03803-X
Prashant Palvia: Expressions for Batched Searching of Sequential and Hierarchical Files. ACM Trans. Database Syst. 10(1): 97-106(1985) BibTeX
Jaideep Srivastava, Vincent Y. Lum: A Tree Based Access Method (TBSAM) for Fast Processing of Aggregate Queries. ICDE 1988: 504-510 BibTeX
Jeffrey Scott Vitter: Faster Methods for Random Sampling. Commun. ACM 27(7): 703-718(1984) BibTeX
Jeffrey Scott Vitter: Random Sampling with a Reservoir. ACM Trans. Math. Softw. 11(1): 37-57(1985) BibTeX
C. K. Wong, Malcolm C. Easton: An Efficient Method for Weighted Sampling Without Replacement. SIAM J. Comput. 9(1): 111-113(1980) BibTeX
S. Bing Yao: Approximating the Number of Accesses in Database Organizations. Commun. ACM 20(4): 260-261(1977) BibTeX

Referenced by

  1. Phillip B. Gibbons, Yossi Matias: New Sampling-Based Summary Statistics for Improving Approximate Query Answers. SIGMOD Conference 1998: 331-342
  2. Daniel Barbará, William DuMouchel, Christos Faloutsos, Peter J. Haas, Joseph M. Hellerstein, Yannis E. Ioannidis, H. V. Jagadish, Theodore Johnson, Raymond T. Ng, Viswanath Poosala, Kenneth A. Ross, Kenneth C. Sevcik: The New Jersey Data Reduction Report. IEEE Data Eng. Bull. 20(4): 3-45(1997)
  3. Gennady Antoshenkov, Mohamed Ziauddin: Query Processing and Optimization in Oracle Rdb. VLDB J. 5(4): 229-237(1996)
  4. Hannu Toivonen: Sampling Large Databases for Association Rules. VLDB 1996: 134-145
  5. Nabil I. Hachem, Chenye Bao, Steve Taylor: Approximate Query Answering In Numerical Databases. SSDBM 1996: 63-73
  6. Augustine C. Ikeji, Farshad Fotouhi: Computation of Partial Query Results Using An Adaptive Stratified Sampling Technique. CIKM 1995: 145-149
  7. Jason Tsong-Li Wang, Gung-Wei Chirn, Thomas G. Marr, Bruce A. Shapiro, Dennis Shasha, Kaizhong Zhang: Combinatorial Pattern Discovery for Scientific Data: Some Preliminary Results. SIGMOD Conference 1994: 115-125
  8. Peter J. Haas, Jeffrey F. Naughton, Arun N. Swami: On the Relative Cost of Sampling for Join Selectivity Estimation. PODS 1994: 14-24
  9. Wen-Chi Hou, Gultekin Özsoyoglu: Processing Time-Constrained Aggregate Queries in CASE-DB. ACM Trans. Database Syst. 18(2): 224-261(1993)
  10. Gennady Antoshenkov: Query Processing in DEC Rdb: Major Issues and Future Challenges. IEEE Data Eng. Bull. 16(4): 42-52(1993)
  11. Richard H. Wolniewicz, Goetz Graefe: Algebraic Optimization of Computations over Scientific Databases. VLDB 1993: 13-24
  12. Frank Olken, Doron Rotem: Sampling from Spatial Databases. ICDE 1993: 199-208
  13. Gennady Antoshenkov: Dynamic Query Optimization in Rdb/VMS. ICDE 1993: 538-547
  14. David J. DeWitt, Jeffrey F. Naughton, Donovan A. Schneider, S. Seshadri: Practical Skew Handling in Parallel Joins. VLDB 1992: 27-40
  15. Gennady Antoshenkov: Random Sampling from Pseudo-Ranked B+ Trees. VLDB 1992: 375-382
  16. Peter J. Haas, Arun N. Swami: Sequential Sampling Procedures for Query Size Estimation. SIGMOD Conference 1992: 341-350
  17. Frank Olken, Doron Rotem: Maintenance of Materialized Views of Sampling Queries. ICDE 1992: 632-641
  18. David J. DeWitt, Jeffrey F. Naughton, Donovan A. Schneider: An Evaluation of Non-Equijoin Algorithms. VLDB 1991: 443-452
  19. Frank Olken, Doron Rotem: Random Sampling from Database Files: A Survey. SSDBM 1990: 92-111
  20. Frank Olken, Doron Rotem, Ping Xu: Random Sampling from Hash Files. SIGMOD Conference 1990: 375-386
  21. Richard J. Lipton, Jeffrey F. Naughton, Donovan A. Schneider: Practical Selectivity Estimation through Adaptive Sampling. SIGMOD Conference 1990: 1-11
  22. Jeffrey F. Naughton, S. Seshadri: On Estimating the Size of Projections. ICDT 1990: 499-513
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
VLDB Proceedings: Copyright © by VLDB Endowment,
ACM SIGMOD Anthology: Copyright © by ACM (, Corrections:
DBLP: Copyright © by Michael Ley (, last change: Sat May 16 23:45:41 2009