Vertical Data Migration in Large Near-Line Document Archives Based on Markov-Chain Predictions.

Achim Kraiss, Gerhard Weikum: Vertical Data Migration in Large Near-Line Document Archives Based on Markov-Chain Predictions. VLDB 1997: 246-255
  author    = {Achim Kraiss and
               Gerhard Weikum},
  editor    = {Matthias Jarke and
               Michael J. Carey and
               Klaus R. Dittrich and
               Frederick H. Lochovsky and
               Pericles Loucopoulos and
               Manfred A. Jeusfeld},
  title     = {Vertical Data Migration in Large Near-Line Document Archives
               Based on Markov-Chain Predictions},
  booktitle = {VLDB'97, Proceedings of 23rd International Conference on Very
               Large Data Bases, August 25-29, 1997, Athens, Greece},
  publisher = {Morgan Kaufmann},
  year      = {1997},
  isbn      = {1-55860-470-7},
  pages     = {246-255},
  ee        = {db/conf/vldb/KraissW97.html},
  crossref  = {DBLP:conf/vldb/97},
  bibsource = {DBLP,}


Large multimedia document archives hold most of their data in near-line tertiary storage libraries for cost reasons. This paper develops an integrated approach to the vertical data migration between the tertiary and secondary storage in that it reconciles speculative preloading, to mask the high latency of the tertiary storage, with the replacement policy of the secondary storage. In addition, it considers the interaction of these policies with the tertiary storage scheduling and controls preloading aggressiveness by taking contention for tertiary storage drives into account. The integrated migration policy is based on a continuous-time Markov-chain (CTMC) model for predicting the expected number of accesses to a document within a specified time horizon. The parameters of the CTMC model, the probabilities of co-accessing certain documents and the interaction times between successive accesses, are dynamically estimated and adjusted to evolving workload patterns by keeping online statistics. The integrated policy for vertical data migration has been implemented in a prototype system. Detailed simulation studies with Web-server-like synthetic workloads indicate significant gains in terms of client response time. The studies also show that the overhead of the statistical bookkeeping and the computations for the access predictions is affordable.

Copyright © 1997 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.

Referenced by

  1. Gerhard Weikum, Arnd Christian König, Achim Kraiss, Markus Sinnwell: Towards Self-Tuning Memory Management for Data Servers. IEEE Data Eng. Bull. 22(2): 3-11(1999)
  2. Achim Kraiss, Gerhard Weikum: Integrated Document Caching and Prefetching in Storage Hierarchies Based on Markov-Chain Predictions. VLDB J. 7(3): 141-162(1998)
  3. Theodore Johnson, Ethan L. Miller: Performance Measurements of Tertiary Storage Devices. VLDB 1998: 50-61
