Files

Action Filename Size Access Description License
Show more files...

Abstract

With the exponential growth of data which are expected to reach 175 zettabytes by 2025, cloud storage is increasingly becoming the central hub for data management and processing. Among many benefits cloud platforms promise predictable performance and cost-efficiency are two fundamental factors driving the success of modern cloud storage. However, under rapid changes in modern cloud storage infrastructure in terms of both software and hardware, new challenges emerge for achieving predictable performance with efficiency. In more detail, modern data-intensive applications and the new wave of computing paradigms (e.g., data analytics, ML, serverless) drive the storage stack to undergo a radical shift towards more feature-rich software designs on top of increasingly heterogeneous architectures. As a result, today's cloud storage stack is extremely heavy-weight and complex, burning 10-20% of data center CPU cycles and introducing severe performance non-determinism (i.e., long tail latencies). Unfortunately, the deployment of new acceleration hardware (e.g., NVMe SSDs and I/O co-processors) only {partially} addresses the problem. Due to the intrinsic complexities and idiosyncrasies in hardware (e.g., NAND Flash management) and lack of system-level support, it remains a challenge to design performant and cost-efficient cloud storage systems. In particular, achieving sub-millisecond level latency predictability in a cost-efficient manner is the new battlefield. Rooted in deep understanding and analysis of existing software/hardware stack, this dissertation focuses on building new abstractions, interfaces and end-to-end storage systems to achieve predictable performance and cost-efficiency using a software/hardware co-design approach. By revisiting the challenges across different layers in a holistic manner, the co-design approach opens up simple yet powerful system-level policy designs to opportunistically exploit hardware idiosyncrasies and heterogeneity. The systems we build can effectively decrease latency spikes by up to orders of magnitude and increase the revenue by 20x.

Details

Actions

Preview

Downloads Statistics

from
to
Download Full History