A One-Pass Algorithm for Accurately Estimating Quantiles for Disk-Resident Data.

Khaled Alsabti, Sanjay Ranka, Vineet Singh: A One-Pass Algorithm for Accurately Estimating Quantiles for Disk-Resident Data. VLDB 1997: 346-355
  author    = {Khaled Alsabti and
               Sanjay Ranka and
               Vineet Singh},
  editor    = {Matthias Jarke and
               Michael J. Carey and
               Klaus R. Dittrich and
               Frederick H. Lochovsky and
               Pericles Loucopoulos and
               Manfred A. Jeusfeld},
  title     = {A One-Pass Algorithm for Accurately Estimating Quantiles for
               Disk-Resident Data},
  booktitle = {VLDB'97, Proceedings of 23rd International Conference on Very
               Large Data Bases, August 25-29, 1997, Athens, Greece},
  publisher = {Morgan Kaufmann},
  year      = {1997},
  isbn      = {1-55860-470-7},
  pages     = {346-355},
  ee        = {db/conf/vldb/AlsabtiRS97.html},
  crossref  = {DBLP:conf/vldb/97},
  bibsource = {DBLP,}


The phi-quantile of an ordered sequence of data values is the element with rank phi*n, where n is the total number of values. Accurate estimates of quantiles are required for the solution of many practical problems. In this paper, we present a new algorithm for estimating the quantile values for disk-resident data. Our algorithm has the following characteristics: (1) It requires only one pass over the data; (2) It is deterministic; (3) It produces good lower and upper bounds of the true values of the quantiles; (4) It requires no a priori knowledge of the distribution of the data set; (5) It has a scalable parallel formulation; (6) Extra time and memory for computing additional quantiles (beyond the first one) are constant per quantile.

We present experimental results on the IBM SP-2. The experimental results show that the algorithm is indeed robust and does not depend on the distribution of the data sets.

Copyright © 1997 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.

Online Paper

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

Matthias Jarke, Michael J. Carey, Klaus R. Dittrich, Frederick H. Lochovsky, Pericles Loucopoulos, Manfred A. Jeusfeld (Eds.): VLDB'97, Proceedings of 23rd International Conference on Very Large Data Bases, August 25-29, 1997, Athens, Greece. Morgan Kaufmann 1997, ISBN 1-55860-470-7
Contents BibTeX

Electronic Edition

From CS Dept., University Trier (Germany)


Additional information is available from


Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami: Mining Association Rules between Sets of Items in Large Databases. SIGMOD Conference 1993: 207-216 BibTeX
Rakesh Agrawal, King-Ip Lin, Harpreet S. Sawhney, Kyuseok Shim: Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases. VLDB 1995: 490-501 BibTeX
Rakesh Agrawal, Arun N. Swami: A One-Pass Space-Efficient Algorithm for Finding Quantiles. COMAD 1995: 0- BibTeX
Ramakrishnan Srikant, Rakesh Agrawal: Mining Quantitative Association Rules in Large Relational Tables. SIGMOD Conference 1996: 1-12 BibTeX
Manuel Blum, Robert W. Floyd, Vaughan R. Pratt, Ronald L. Rivest, Robert Endre Tarjan: Time Bounds for Selection. J. Comput. Syst. Sci. 7(4): 448-461(1973) BibTeX
Kenneth E. Batcher: Sorting Networks and Their Applications. AFIPS Spring Joint Computing Conference 1968: 307-314 BibTeX
William G. Cochran: Sampling Techniques, 3rd Edition. John Wiley 1977, ISBN 0-471-16240-X
David J. DeWitt, Jeffrey F. Naughton, Donovan A. Schneider: Parallel Sorting on a Shared-Nothing Architecture using Probabilistic Splitting. PDIS 1991: 280-291 BibTeX
Robert W. Floyd, Ronald L. Rivest: Expected Time Bounds for Selection. Commun. ACM 18(3): 165-172(1975) BibTeX
Raj Jain, Imrich Chlamtac: The P² Algorithm for Dynamic Calculation of Quantiiles and Histograms Without Storing Observations. Commun. ACM 28(10): 1076-1085(1985) BibTeX
Vipin Kumar, Ananth Grama, Anshul Gupta, George Karypis: Introduction to Parallel Computing. Benjamin/Cummings 1994, ISBN 0-8053-3170-0
Robert Kooi: The Optimization of Queries in Relational Databases. Ph.D. thesis, Case Western Reserve University 1980
Xiaobo Li, Paul Lu, Jonathan Schaeffer, John Shillington, Pok Sze Wong, Hanmao Shi: On the Versatility of Parallel Sorting by Regular Sampling. Parallel Computing 19(10): 1079-1103(1993) BibTeX
M. Muralikrishna, David J. DeWitt: Equi-Depth Histograms For Estimating Selectivity Factors For Multi-Dimensional Queries. SIGMOD Conference 1988: 28-36 BibTeX
J. Ian Munro, Mike Paterson: Selection and Sorting with Limited Storage. Theor. Comput. Sci. 12: 315-323(1980) BibTeX
Viswanath Poosala, Yannis E. Ioannidis, Peter J. Haas, Eugene J. Shekita: Improved Histograms for Selectivity Estimation of Range Predicates. SIGMOD Conference 1996: 294-305 BibTeX
Gregory Piatetsky-Shapiro, Charles Connell: Accurate Estimation of the Number of Tuples Satisfying a Condition. SIGMOD Conference 1984: 256-276 BibTeX
George Kingsley Zipf: Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology. Addison-Wesley 1949

Referenced by

  1. Gurmeet Singh Manku, Sridhar Rajagopalan, Bruce G. Lindsay: Random Sampling Techniques for Space Efficient Online Computation of Order Statistics of Large Datasets. SIGMOD Conference 1999: 251-262
  2. Gurmeet Singh Manku, Sridhar Rajagopalan, Bruce G. Lindsay: Approximate Medians and other Quantiles in One Pass and with Limited Memory. SIGMOD Conference 1998: 426-435
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
VLDB Proceedings: Copyright © by VLDB Endowment,
ACM SIGMOD Anthology: Copyright © by ACM (, Corrections:
DBLP: Copyright © by Michael Ley (, last change: Sat May 16 23:46:17 2009