Robert Krauthgamer, Aranyak Mehta, Vijayshankar Raman and Atri Rudra

Greedy List Intersection

Abstract :

A common technique for processing conjunctive queries is to first match each predicate separately using an index lookup, and then compute the intersection of the resulting row-id lists, via an AND-tree. The performance of this technique depends crucially on the order of lists in this tree: it is important to compute early the intersections that will produce small results. But this optimization is hard to do when the data or predicates have correlation.

We present a new algorithm for ordering the lists in an AND-tree by sampling the intermediate intersection sizes. We prove that our algorithm is near-optimal and validate its effectiveness experimentally on datasets with a variety of distributions.

Versions

Proceedings of the The 24th International Conference on Data Engineering (ICDE). April 2008.
UB CSE Technical report # 2007-11.