Optimizing Tensor Loading for Distributed Machine Learning Applications

Du, Kuntai

doi:10.6082/uchicago.15535

Optimizing Tensor Loading for Distributed Machine Learning Applications

Du, Kuntai

2025

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket

Cite

Files

Abstract

Today, a wide range of machine learning applications, including video analytics and large language model (LLM) applications, are becoming distributed. Specifically, they both require loading data from a data source to the machine learning model for efficient inference. This thesis focuses on two concrete applications: video analytics, where analytical DNNs need to load video feeds from remote cameras, and LLM inference, where the inference engine needs to load KV caches from storage for faster processing. Our observation is that, by properly identifying the important parts of the data and loading them with high priority, the latency of the end-to-end pipeline can be greatly improved without sacrificing other performance metrics (accuracy in video analytics and throughput in LLM serving). Concretely, the pixels associated with objects in video analytics are more important than others, and similarly, the KV caches associated with requests with lower job completion time (JCT) are more important than others. However, existing approaches either estimate this importance too slowly or too inaccurately. This thesis leverages application-driven insights to quickly identify the important data with high accuracy. Our evaluation shows that we can reduce latency by 2–3× without sacrificing accuracy or throughput.

Details

Title

Optimizing Tensor Loading for Distributed Machine Learning Applications

Author

Du, Kuntai : University of Chicago : (https://orcid.org/0000-0002-3964-4079)

Degree Type

Ph.D.

Content Type

Dissertation

Academic Advisor

Junchen Jiang

Committee Member

Ion Stoica
Ganesh Ananthanarayanan
Shan Lu
Ari Holtzman

Keywords

Tensor transmission; Video analytics; Region of interest; Large language models; Prefill only

Digital Object Identifier

https://doi.org/10.6082/uchicago.15535

Funding Information

National Science Foundation, (https://ror.org/021nxhr62), ROR, CSR: Medium: Improving the Interface between Machine Learning and Software Systems, 2313190
National Science Foundation, (https://ror.org/021nxhr62), ROR, CAREER: Enabling Perception-Driven Optimization for Online Videos, 2146496
National Science Foundation, (https://ror.org/021nxhr62), ROR, CNS Core: Small: Closing the Reality Gap for Learning-Augmented Network Systems, 2131826
National Science Foundation, (https://ror.org/021nxhr62), ROR, CNS Core: Medium: Systems Challenges in Scaling Distributed Intelligent Applications, 1901466

Publication Date

2025-08

Language

English

Copyright Statement

Record Appears in

Physical Sciences Division > Computer Science
All

Record Created

2025-06-19

PDF

Statistics

Download Full History