Agenda
23/jun
Abstract.
«The state of the art in database query processing has recently been shaken by a new generation of multi-join processing algorithms with strong optimality guarantees based on the AGM bound of queries: the maximum size of the output of the query over all possible relations with the same cardinalities. Over the years, this has been shown to translate into considerable practical improvements over the classical binary join techniques in use since the sixties. In this talk I will first present the AGM bound, which is quite interesting from an analysis-of-algorithms perspective (even if worst-case). I will then show how it has been typically achieved by following its formula, which has led to the well-known Leapfrog TrieJoin algorithm. Finally, I will introduce a new data structure that regards d-column tables as points in a d-dimensional space and represents those grids using, essentially, d-dimensional versions of quadtrees. The join algorithm is then implemented by (virtually) raising their dimension to include the missing attributes of the join, and then (virtually) traversing their intersection, or equivalently the output space. I will then show how this extremely simple geometric approach does reach worst-case-optimality in a completely different way. This talk is based on our work «Optimal Joins using Compressed Quadtrees», which is to appear in ACM Transactions on Database Systems.»
Presenter
Gonzalo Navarro, DCC Universidad de Chile
When and where
June 23, 17:30–18:30 (UTC) / 13:30–14:30 in America/Santiago