Vertical Data Migration in Large Near-Line Document Archives Based on Markov-Chain Predictions.

Achim Kraiss, Gerhard Weikum: Vertical Data Migration in Large Near-Line Document Archives Based on Markov-Chain Predictions. VLDB 1997: 246-255
  author    = {Achim Kraiss and
               Gerhard Weikum},
  editor    = {Matthias Jarke and
               Michael J. Carey and
               Klaus R. Dittrich and
               Frederick H. Lochovsky and
               Pericles Loucopoulos and
               Manfred A. Jeusfeld},
  title     = {Vertical Data Migration in Large Near-Line Document Archives
               Based on Markov-Chain Predictions},
  booktitle = {VLDB'97, Proceedings of 23rd International Conference on Very
               Large Data Bases, August 25-29, 1997, Athens, Greece},
  publisher = {Morgan Kaufmann},
  year      = {1997},
  isbn      = {1-55860-470-7},
  pages     = {246-255},
  ee        = {db/conf/vldb/KraissW97.html},
  crossref  = {DBLP:conf/vldb/97},
  bibsource = {DBLP,}


Large multimedia document archives hold most of their data in near-line tertiary storage libraries for cost reasons. This paper develops an integrated approach to the vertical data migration between the tertiary and secondary storage in that it reconciles speculative preloading, to mask the high latency of the tertiary storage, with the replacement policy of the secondary storage. In addition, it considers the interaction of these policies with the tertiary storage scheduling and controls preloading aggressiveness by taking contention for tertiary storage drives into account. The integrated migration policy is based on a continuous-time Markov-chain (CTMC) model for predicting the expected number of accesses to a document within a specified time horizon. The parameters of the CTMC model, the probabilities of co-accessing certain documents and the interaction times between successive accesses, are dynamically estimated and adjusted to evolving workload patterns by keeping online statistics. The integrated policy for vertical data migration has been implemented in a prototype system. Detailed simulation studies with Web-server-like synthetic workloads indicate significant gains in terms of client response time. The studies also show that the overhead of the statistical bookkeeping and the computations for the access predictions is affordable.

Copyright © 1997 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.

Online Paper

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

Matthias Jarke, Michael J. Carey, Klaus R. Dittrich, Frederick H. Lochovsky, Pericles Loucopoulos, Manfred A. Jeusfeld (Eds.): VLDB'97, Proceedings of 23rd International Conference on Very Large Data Bases, August 25-29, 1997, Athens, Greece. Morgan Kaufmann 1997, ISBN 1-55860-470-7
Contents BibTeX

Electronic Edition

From CS Dept., University Trier (Germany)


Virgilio Almeida, Azer Bestavros, Mark Crovella, Adriana de Oliveira: Characterizing Reference Locality in the WWW. PDIS 1996: 92-103 BibTeX
Azer Bestavros: Speculative Data Dissemination and Service to Reduce Server Load, Network Traffic and Service Time in Distributed Information Systems. ICDE 1996: 180-187 BibTeX
Pei Cao, Edward W. Felten, Anna R. Karlin, Kai Li: A Study of Integrated Prefetching and Caching Strategies. SIGMETRICS 1995: 188-197 BibTeX
Pei Cao, Edward W. Felten, Anna R. Karlin, Kai Li: Implementation and Performance of Integrated Application-Controlled File Caching, Prefetching, and Disk Scheduling. ACM Trans. Comput. Syst. 14(4): 311-343(1996) BibTeX
Ellis E. Chang, Randy H. Katz: Exploiting Inheritance and Structure Semantics for Effective Clustering and Buffering in an Object-Oriented DBMS. SIGMOD Conference 1989: 348-357 BibTeX
Ling Tony Chen, Doron Rotem: Optimizing Storage of Objects on Mass Storage Systems with Robotic Devies. EDBT 1994: 273-286 BibTeX
Jia-bing R. Cheng, Ali R. Hurson: On The Performance of Object-Based Buffering. PDIS 1991: 30-37 BibTeX
George P. Copeland, William Alexander, Ellen E. Boughter, Tom W. Keller: Data Placement In Bubba. SIGMOD Conference 1988: 99-108 BibTeX
Kenneth M. Curewitz, P. Krishnan, Jeffrey Scott Vitter: Practical Prefetching via Data Compression. SIGMOD Conference 1993: 257-266 BibTeX
Daniel Alexander Ford, Stavros Christodoulakis: Optimal Placement of High-Probability Randomly Retrieved Blocks on CLV Optical Discs. ACM Trans. Inf. Syst. 9(1): 1-30(1991) BibTeX
Carsten Andreas Gerlhof, Alfons Kemper: Prefetch Support Relations in Object Bases. POS 1994: 115-126 BibTeX
Carsten Andreas Gerlhof, Alfons Kemper, Christoph Kilger, Guido Moerkotte: Partition-Based Clustering in Object Bases: From Theory to Practice. FODO 1993: 301-316 BibTeX
Leana Golubchik, Richard R. Muntz, Richard W. Watson: Analysis of Striping Techniques in Robotic Storage Libraries. IEEE Symposium on Mass Storage Systems 1995: 225-238 BibTeX
Jim Gray, Gianfranco R. Putzolu: The 5 Minute Rule for Trading Memory for Disk Accesses and The 10 Byte Rule for Trading Memory for CPU Time. SIGMOD Conference 1987: 395-398 BibTeX
Bruce Hillyer, Abraham Silberschatz: Random I/O Scheduling in Online Tertiary Storage Systems. SIGMOD Conference 1996: 195-204 BibTeX
Anna R. Karlin, Steven J. Phillips, Prabhakar Raghavan: Markov Paging (Extended Abstract). FOCS 1992: 208-217 BibTeX
Siu-Wah Lau, John C. S. Lui, P. C. Wong: A Cost-effective Near-line Storage Server for Multimedia System. ICDE 1995: 449-456 BibTeX
Frank Moser, Achim Kraiss, Wolfgang Klas: L/MRP: A Buffer Management Strategy for Interactive Continuous Data Flows in a Multimedia DBMS. VLDB 1995: 275-286 BibTeX
Jussi Myllymaki, Miron Livny: Relational Joins for Data on Tertiary Storage. ICDE 1997: 159-168 BibTeX
Randolph Nelson: Probability, Stocastic Processes, and Queuing Theory - The Mathematics of Computer Performance Modeling. Springer 1995, ISBN 0-387-94452-4
Toshihiro Nemoto, Masaru Kitsuregawa, Mikio Takagi: Analysis of Cassette Migration Activities in Scalable Tape Archiver. DASFAA 1997: 461-470 BibTeX
Elizabeth J. O'Neil, Patrick E. O'Neil, Gerhard Weikum: The LRU-K Page Replacement Algorithm For Database Disk Buffering. SIGMOD Conference 1993: 297-306 BibTeX
Mark Palmer, Stanley B. Zdonik: Fido: A Cache That Learns to Fetch. VLDB 1991: 255-264 BibTeX
R. Hugo Patterson, Garth A. Gibson, Eka Ginting, Daniel Stodolsky, Jim Zelenka: Informed Prefetching and Caching. SOSP 1995: 79-95 BibTeX
Chris Ruemmler, John Wilkes: An Introduction to Disk Drive Modeling. IEEE Computer 27(3): 17-28(1994) BibTeX
Sunita Sarawagi: Query Processing in Tertiary Memory Databases. VLDB 1995: 585-596 BibTeX
Peter Scheuermann, Junho Shim, Radek Vingralek: WATCHMAN : A Data Warehouse Intelligent Cache Manager. VLDB 1996: 51-62 BibTeX
Markus Sinnwell, Gerhard Weikum: A Cost-Model-Based Online Method for Ditributed Caching. ICDE 1997: 532-541 BibTeX
Peter Scheuermann, Gerhard Weikum, Peter Zabback: ``Disk Cooling'' in Parallel Disk Systems. IEEE Data Eng. Bull. 17(3): 29-40(1994) BibTeX
Alan Jay Smith: Long Term File Migration: Development and Evaluation of Algorithms. Commun. ACM 24(8): 521-532(1981) BibTeX
Michael Stonebraker: Managing Persistent Objects in a Multi-Level Store. SIGMOD Conference 1991: 2-11 BibTeX
James Z. Teng, Robert A. Gumaer: Managing IBM Database 2 Buffers to Maximize Performance. IBM Systems Journal 23(2): 211-218(1984) BibTeX
Manolis M. Tsangaris, Jeffrey F. Naughton: A Stochastic Approach for Clustering in Object Bases. SIGMOD Conference 1991: 12-21 BibTeX
Manolis M. Tsangaris, Jeffrey F. Naughton: On the Performance of Object Clustering Techniques. SIGMOD Conference 1992: 144-153 BibTeX
Shivakumar Venkataraman, Miron Livny, Jeffrey F. Naughton: Memory Management for Scalable Web Data Servers. ICDE 1997: 510-519 BibTeX
Hartmut Wedekind, Georg Zörntlein: Prefetching in Realtime Database Applications. SIGMOD Conference 1986: 215-226 BibTeX
Gerhard Weikum, Christof Hasse, Alex Moenkeberg, Peter Zabback: The COMFORT Automatic Tuning Project, Invited Project Review. Inf. Syst. 19(5): 381-432(1994) BibTeX
C. K. Wong: Algorithmic Studies in Mass Storage Systems. Computer Science Press 1983

Referenced by

  1. Gerhard Weikum, Arnd Christian König, Achim Kraiss, Markus Sinnwell: Towards Self-Tuning Memory Management for Data Servers. IEEE Data Eng. Bull. 22(2): 3-11(1999)
  2. Achim Kraiss, Gerhard Weikum: Integrated Document Caching and Prefetching in Storage Hierarchies Based on Markov-Chain Predictions. VLDB J. 7(3): 141-162(1998)
  3. Theodore Johnson, Ethan L. Miller: Performance Measurements of Tertiary Storage Devices. VLDB 1998: 50-61
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
VLDB Proceedings: Copyright © by VLDB Endowment,
ACM SIGMOD Anthology: Copyright © by ACM (, Corrections:
DBLP: Copyright © by Michael Ley (, last change: Sat May 16 23:46:16 2009