Learning Structure for Computer Systems Management

Ding, Yi

doi:10.6082/uchicago.2735

Ding, Yi

2020

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket

Files

Abstract

Modern computer systems expose diverse configurable parameters whose complicated interactions have surprising effects on performance and energy. This puts a great burden on systems designers and researchers to manage such complexity. Machine learning (ML) creates an opportunity to alleviate this burden by modeling resources' complicated, non-linear interactions and deliver an optimal solution to scheduling and resource management problems. However, naively applying traditional ML methods, such as deep learning, creates several challenges including generalization, robustness, and interpretability. This dissertation contains two projects that tackle the fundamental challenges described above in learning for systems by incorporating the underlying system structure, which is defined as the geometry of the system problems we solve. The first project describes learning for systems optimization. We propose a novel generative model to address the data scarcity issue and a multi-phase sampling approach by exploiting system structure. This study is strong evidence that after achieving a certain level of accuracy, it is no longer profitable for systems researchers to improve learning systems without accounting for the structure. The second project describes Sherlock, a causal straggler prediction framework for datacenters. Stragglers are rare events that exhibit extreme tail latencies, which lead to imbalance---structure---in the training data. To address the data imbalance issue, Sherlock augments correlation-based learning with causal analysis without prior knowledge. To effectively mitigate stragglers, Sherlock applies permutation feature importance (PFI) to gain insights into the straggling behavior for further system intervention. Sherlock’s combination of PS and PFI allows it to make accurate, interpretable predictions from imbalanced training data. ML has influenced much recent systems research including providing systems support for ML and applying ML to solve systems problems. To further produce generalizable, robust, and interpretable results, it is crucial to understand the underlying system structure. This dissertation constitutes a new understanding of ML for systems that opens the way to a novel class of methodology and application by incorporating causal inference techniques.