A Selective Preprocessing Offloading Framework for Reducing Data Traffic in DL Training

Wang, Meng; Waldspurger, Gus; Sundararaman, Swaminathan

Wang, Meng; Waldspurger, Gus; Sundararaman, Swaminathan

2024

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket

Files

Abstract

Deep learning (DL) training is data-intensive and often bottlenecked by fetching data from remote storage. Recognizing that many samples' sizes diminish during data preprocessing, we explore selectively offloading preprocessing to remote storage to mitigate data traffic. We conduct a case study to uncover the potential benefits and challenges of this approach. We then propose SOPHON, a framework that selectively offloads preprocessing tasks at a fine granularity in order to reduce data traffic, utilizing online profiling and adaptive algorithms to optimize for every sample in every training scenario. Our results show that SOPHON can reduce data traffic and training time by 1.2-2.2x over existing solutions.