Files
Abstract
An increasing number of software applications adopt machine learning (ML) components to solve real-world problems. The offering of ML cloud APIs further ease developers' burden of incorporating ML solutions, typically deep neural networks (DNNs). However, to achieve a correct, fast, and energy-efficient ML application, developers still need to carefully design its three crucial components: ML algorithm, system environment, and software context.
To improve correctness, performance, and energy-efficiency of ML applications, this dissertation works on these components and makes the following contributions:
First, to enhance the flexibility of neural networks, this dissertation proposes a novel neural network architecture and a customized optimizer that support anytime prediction. This design allows one neural network to generate a series of increasingly accurate outputs over time without sacrificing accuracy for flexibility.
Second, this dissertation designs a run-time scheduler ALERT, which further manages system resources. ALERT holistically configures neural networks and system resources together to meet application-specific accuracy, performance, and energy-consumption constraints. It uses a probabilistic model to detect environmental volatility and makes use of the full potential of the DNN candidate set to optimize performance and satisfy constraints.
Third, to understand the challenges of developing ML software, this dissertation conducts the first comprehensive study about how real-world applications are using machine learning cloud APIs. We generalize 8 anti-patterns that degrade functional, performance, or economical quality of the software.
Fourth, guided by this study, we propose Keeper, a new testing framework for software systems that use machine learning APIs. Keeper automatically generates many test cases to thoroughly test every branch in the specified function and its callees. It analyzes the test runs and reports many failures, as well as potential patches, to developers.