2002 Digital Symposium Collection

Efficient Computation of Iceberg Cubes with Complex Measures

Jiawei Han, Jian Pei, Guozhu Dong, and Ke Wang
View Paper (PDF)

Return to Cubes and Aggregates

Abstract

It is often too expensive to compute and materialize a complete highdimensional data cube. Computing an iceberg cube, which contains only aggregates above certain thresholds, is an effective way to derive nontrivial multidimensional aggregations for OLAP and data mining. In this paper, we study efficient methods for computing iceberg cubes with some popularly used complex measures, such as average, and develop a methodology that adopts a weaker but antimonotonic condition for testing and pruning search space. In particular, for efficient computation of iceberg cubes with the average measure, we propose a topk average pruning method and extend two previously studied methods, Apriori and BUC, to Topk Apriori and Topk BUC. To further improve the performance, an interesting hypertree structure, called Htree, is designed and a new iceberg cubing method, called Topk HCubing, is developed. Our performance study shows that Topk BUC and Topk HCubing are promising candidates for scalable computation, and Topk HCubing has the best performance in many cases.