Files
Abstract
With the exponential growth of data which are expected to reach 175
zettabytes by 2025, cloud storage is increasingly becoming the central
hub for data management and processing. Among many benefits cloud
platforms promise predictable performance and cost-efficiency are two
fundamental factors driving the success of modern cloud storage.
However, under rapid changes in modern cloud storage infrastructure in
terms of both software and hardware, new challenges emerge for achieving
predictable performance with efficiency.
In more detail, modern data-intensive applications and the new wave of
computing paradigms (e.g., data analytics, ML, serverless) drive the
storage stack to undergo a radical shift towards more feature-rich
software designs on top of increasingly heterogeneous architectures. As
a result, today's cloud storage stack is extremely heavy-weight and
complex, burning 10-20% of data center CPU cycles and introducing severe
performance non-determinism (i.e., long tail latencies). Unfortunately,
the deployment of new acceleration hardware (e.g., NVMe SSDs and I/O
co-processors) only {partially} addresses the problem. Due to the
intrinsic complexities and idiosyncrasies in hardware (e.g., NAND Flash
management) and lack of system-level support, it remains a challenge to
design performant and cost-efficient cloud storage systems. In
particular, achieving sub-millisecond level latency predictability in a
cost-efficient manner is the new battlefield.
Rooted in deep understanding and analysis of existing software/hardware
stack, this dissertation focuses on building new abstractions,
interfaces and end-to-end storage systems to achieve predictable
performance and cost-efficiency using a software/hardware co-design
approach. By revisiting the challenges across different layers in a
holistic manner, the co-design approach opens up simple yet powerful
system-level policy designs to opportunistically exploit hardware
idiosyncrasies and heterogeneity. The systems we build can effectively
decrease latency spikes by up to orders of magnitude and increase the revenue
by 20x.