Thrifty Query Processing

Tang, Dixin

doi:10.6082/uchicago.2736

Thrifty Query Processing

Tang, Dixin

2020

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket

Cite

Files

Abstract

Database systems have long been designed to take one of the two major approaches to process a dataset under changes (e.g. a data stream). Eager query processing methods, such as continuous query processing or immediate incremental view maintenance (IVM), are optimized to reduce query latency. They eagerly maintain standing queries by consuming all available resources to immediately process new data, which can be a major source of wasting CPU cycles and memory resources. On the other hand, lazy query processing methods, such as batch processing or deferred IVM, defer the query execution to a future point to reduce resource consumption but suffer high query latencies. We find that existing eager and lazy query execution approaches are optimized for the applications on the two ends of the resource-latency trade-off, but the middle ground between the two is rarely exploited. This dissertation proposes a new query processing paradigm Thrifty Query Processing (TQP), for the middle-ground applications where users do not need to see the up-to-date query result right after the data is ready and allow a slackness of time before the result is returned. TQP exploits this time slackness to reduce resource consumption and allows users to tune this slackness to adjust query latencies and resource consumption. Implementing TQP involves the redesigns of several core database components. First, we have a new user model that allows users to not just submit a SQL query, but also specify the time slackness information. Specifically, users can specify a performance goal that represents the maximally allowed time to return the result after the data is complete. After, we design a new query execution engine to leverage this performance goal information to reduce CPU cycles. This execution engine includes optimizations for both a single query and multiple queries. For a single query, we consider selectively delaying parts of a query to reduce the resource consumption while meeting the performance goals. For multiple queries, we find that shared execution may not decrease the resource consumption because sharing queries with different performance goals requires the whole plan to execute eagerly to meet the highest performance goal (i.e. the lowest query latency). Therefore, we consider selectively sharing queries to avoid the overhead of eager query execution but also exploit the benefit of eliminating redundant work across queries. Finally, we design a memory management component to release occupied memory resources when the query is not active. We find that in many cases the data arrival rate is low (e.g. late data), where the query may have a long idle time. Therefore, we selectively release memory resources (e.g. intermediate states) that are least useful for processing the new data. We implement TQP in CrocodileDB, a resource-efficient database, and perform extensive experiments to evaluate each component of CrocodileDB. We show that CrocodileDB can significantly reduce CPU and memory consumption while providing similar query latencies compared to existing approaches.