Files

Abstract

Deep learning (DL) has made significant impacts in many domains, including computer vision (CV), natural language processing (NLP), recommender systems, and many others. Besides the breakthroughs made to the model architectures, data has been another fundamental factor that significantly impacts the model performance. This emphasis on data has given rise to the concept of data-centric artificial intelligence (AI). Despite its growing importance, studies focusing on developing novel data utilization algorithms that enhance model performance without modifying its architecture are still lacking. Addressing this gap, this thesis proposes novel data utilization algorithms that correspond to different steps of the deep learning pipeline, ranging from data collection, formulation, to model training, evaluation and to model inference as in many deployed applications. These algorithms aim to improve model performance, robustness, and trustworthiness through the lens of data utilization, while not altering model architectures or increasing computational or time costs.

In the data collection and formulation stage, we propose two novel strategies targeting both data scarcity and abundance respectively, which are two opposite yet equally crucial data challenges commonly found in many DL applications. Data scarcity refers to scenarios when DL model is applied to real-world application domains where its labeled data is expensive to obtain, thus demanding more careful data collection algorithms so that the model performance is best optimized with limited data. In fact, this collection process is often addressed through active learning (AL). In this thesis, we propose Direct Acquisition Optimization (DAO), a novel AL algorithm that optimizes sample selections directly based on the expected true loss reduction. On the other hand, data abundance refers to situations when the amount of data is larger than model can learn, leading to performance saturation and failures in scaling, such as in recommender systems, where model performance saturates without taking full advantage of the abundant amount of user-item interaction data. In this thesis, we propose User-Centric Ranking (UCR), an alternative data formulation strategy that is based on the transposed view of the dyadic user-item interactions. UCR breaks the curse of data saturation of modern transformer-based recommender systems, enabling them to consume larger amount of data and achieve higher performance.

In the model training stage, we demonstrate through vision-language models, arguing that although contrastive language-image pretraining (CLIP) has set new benchmarks by leveraging self-supervised contrastive learning on large amounts of text-image pairs, its dependency on rigid one-to-one mappings overlooks the complex and often multifaceted relationships between and within the text-image data pairs, causing inefficient data utilization during the pretraining process. In response, we develop Ranking-Consistent Language-Image Pretraining (RankCLIP), a novel pretraining method that extends beyond the existing rigid one-to-one matching framework of CLIP and its variants. By leveraging both in-modal and cross-modal ranking consistency, RankCLIP improves the alignment process, enabling it to capture the nuanced many-to-many relationships between and within each modality.

In the model evaluation stage, we identify the inadequacies of scalar-based error metrics in evaluating DL models, as they are often too abstract to reveal model weak spots and properties. More importantly, scalar-based metrics implicitly assume that the test data is large enough and uniformly distributed, so that these averaged values are fair reflections of the true model performance. However, this is sometimes not the case, as there might not be enough test data in the first place. To this end, we propose a better test data utilization strategy for model evaluations. More specifically, we develop Non-Equivariance Revealed on Orbits (NERO), a novel model evaluation tool that employs a combination of task-agnostic interactive interface and task-dependent visualizations to intricately evaluate and interpret model behaviors through analyzing its equivariance on purposefully designed data permutations. NERO transforms model evaluation from scalar-based, abstract metrics to robustness-based interactive visualizations that not only evaluate model performance, but also interpret model behaviors, promoting deeper model understanding.

Finally, in the inference stage, given the uniqueness of auto-regressive models, where their performance can be further improved via decoding strategies, we explore how novel data utilization leads to novel decoding algorithm that improves model performance and trustworthiness, without the need of acquiring new data or conducting additional fine-tuning. Specifically, we introduce Hallucination Reduction through Adaptive Focal-Contrast decoding (HALC), a novel decoding strategy that utilizes fine-grained visual context to help pretrained large vision-language models (LVLMs) mitigate object hallucinations (OH) and generate more trustworthy outputs.

Details

Actions

PDF

from
to
Export
Download Full History