Files

Abstract

Answering queries accurately at interactive speeds has become more challenging in modern data systems due to the massive growth of data. Such challenges lead to an increasing interest in Approximate Query Processing (AQP) techniques because they enable timely query execution in scenarios that can tolerate some degree of inaccuracy. While latency and accuracy have been the two main factors considered by many AQP systems, in our studies, we found other dimensions like applicability, reliability, robustness and data availability, etc. could also be the main considerations in certain scenarios and such demands call for the design of novel AQP techniques. In this thesis, we propose novel AQP techniques of different characteristics for different scenarios where AQP can be useful. We first discuss PASS, a system that combines sampling and aggregation for better accuracy while keeping the latency and storage cost at a favorable level. As a follow-up of PASS, we present JanusAQP, a dynamic AQP system that extends the static partition tree proposed in PASS and addresses several challenges in a dynamic environment that make the system more practical. Thirdly, we propose PC, a novel missing-data analysis framework that not only enables a presentation of missing data but also the derivation of a tight hardbound for optimal reliability. Lastly, we discuss DQM, our effort in applying machine learning to manage materialized views in a robust manner.

Details

Actions

PDF

from
to
Export
Download Full History