What is the Data Warehousing Problem? (Are Materialized Views the Answer?).

Ashish Gupta, Inderpal Singh Mumick: What is the Data Warehousing Problem? (Are Materialized Views the Answer?). VLDB 1996: 602
  author    = {Ashish Gupta and
               Inderpal Singh Mumick},
  editor    = {T. M. Vijayaraman and
               Alejandro P. Buchmann and
               C. Mohan and
               Nandlal L. Sarda},
  title     = {What is the Data Warehousing Problem? (Are Materialized Views
               the Answer?)},
  booktitle = {VLDB'96, Proceedings of 22th International Conference on Very
               Large Data Bases, September 3-6, 1996, Mumbai (Bombay), India},
  publisher = {Morgan Kaufmann},
  year      = {1996},
  isbn      = {1-55860-382-4},
  pages     = {602},
  ee        = {db/conf/vldb/GuptaM96.html},
  crossref  = {DBLP:conf/vldb/96},
  bibsource = {DBLP,}


The term "Data Warehousing" is used for database applications with one or more of the following characteristics:
(1) Data is integrated from several, possibly heterogeneous, sources into a large data store, called the "data warehouse".
(2) A large data store functions as the database of record, with access to detailed data for operational and/or decision support applications. The database of record is called a "data warehouse".
(3) A "Data Warehouse" summarizes data along several dimensions, and stores the summarized data for aggregate query processing by OLAP and decision support applications. The detailed data may or may not be stored in the warehouse.

A view is a derived relation defined in terms of base (stored) relations. A view can be materialized by storing the tuples of the view in the database. A materialized view provides fast access to data; the speed difference is critical in applications where the query rate is high and the views are complex or over data in remote databases, so that it is not feasible to recompute the view for every query.

Data warehousing has become increasingly visible as a research issue following in the wake of enormous market activity in the past few years. Warehousing is reputed to be the next big corporate information initiative where every database company hopes to make its fortune. Similarly, materialized views are finding increased research activity, with applications in decision support, OLAP, query optimization, and replication, all of which are relevant for data warehousing.

What new database problems are opened up by data warehousing? Clearly, warehouses need database systems to support larger and larger amounts of data, running into hundreds of gigabytes and tens of terabytes. Large parallel database systems need to be developed. However, are there problems other than those associated with building any large database system. What about issues of database integration, heterogenous systems, database loading, batch processing, data snapshots, backups, aggregate query processing, and OLAP query optimization.

Can materialized view technology provide the answer to most or all of these problems? Many people believe so, and claim that warehousing is no more than a new name for caching and materialized views. Many researchers and industry developers have put their time and money behind this belief and are building systems and products based on materialized views.

Can materialized views technology solve the problems encountered in doing data warehousing using database systems? What work needs to be done in materialized views to develop such technology and to make it usable? Are there significant warehousing problems outside materialized views?


Copyright © 1996 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.

Online Paper

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

Matthias Jarke, Michael J. Carey, Klaus R. Dittrich, Frederick H. Lochovsky, Pericles Loucopoulos, Manfred A. Jeusfeld (Eds.): VLDB'97, Proceedings of 23rd International Conference on Very Large Data Bases, August 25-29, 1997, Athens, Greece. Morgan Kaufmann 1997, ISBN 1-55860-470-7
Contents BibTeX

Referenced by

  1. Laks V. S. Lakshmanan, Fereidoon Sadri, Subbu N. Subramanian: On Efficiently Implementing SchemaSQL on an SQL Database System. VLDB 1999: 471-482
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
VLDB Proceedings: Copyright © by VLDB Endowment,
ACM SIGMOD Anthology: Copyright © by ACM (, Corrections:
DBLP: Copyright © by Michael Ley (, last change: Sat May 16 23:46:14 2009