Practical Backdoor Attacks and Defenses in Deep Learning Systems

Yao, Yuanshun

doi:10.6082/uchicago.2560

Yao, Yuanshun

2020

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket

Files

Abstract

Deep neural networks (DNNs) are widely deployed today, from image classification to voice recognition to natural language processing. However, DNNs are opaque mathematical models that do not present logical explanations of their behaviors. This lack of transparency in DNN models can lead to certain unexpected and unpredictable behaviors that could be exploited by attackers. Prior works have demonstrated a series of attacks on DNN models. One particular attack is backdoor attack. By poisoning the training data, backdoor attacks seek to embed hidden malicious behaviors inside DNN models. The malicious behaviors are only activated when a "trigger'" is present in inputs. Triggers are specific patterns in inputs chosen by the attacker, e.g. sticky notes in traffic sign images that make models recognize any traffic signs as Speed Limit. A backdoored model produces misclassifications on inputs with triggers, while performing as expected on normal inputs. Backdoor attacks pose a great threat to deep learning systems because they yield consistent misclassification whenever the trigger is applied to an input. Backdoor attacks (first presented in 2017) have raised significant attention from research communities and government agencies. However, existing studies, on either backdoor attacks or defenses, are mostly based on limited scenarios, making simplified assumptions on victim/attacker behaviors or DNN models. There is no concrete study on how these attacks and defenses perform in real world scenarios. In this dissertation, I seek to bridge this gap by exploring backdoor attacks and defenses under practical constraints. I find that some of those assumptions might not hold in practice and there are certain unique challenges imposed by the real world. Specifically, my dissertation contains three components. First, I present a practical defense against backdoor attacks, i.e. Neural Cleanse. Most of the prior backdoor defenses assume defenders have access to poisoned training samples (samples with triggers). However in a real world system, defenders often do not know which training samples are poisoned. Unlike those defenses, Neural Cleanse only requires a small set of clean samples, which are easily obtainable for defenders in practice. In addition, existing defenses mostly either detect the existence of backdoors or remove backdoors from the model. But a practical defense should be able to perform both. Neural Cleanse offers a full pipeline of mitigation, starting with detecting and identifying backdoors, then filtering backdoored inputs, and finally removing backdoors from the model. Second, I study the effectiveness of backdoor attacks in real world systems and find that existing backdoor attacks cannot be directly applied. Most of backdoor attacks assume the scenario that victims would train their models from scratch. However in practice, practitioners are more likely to customize pretrained models on their local data. The customization process erases the backdoors embedded using conventional methods. Nevertheless I show that the attack can be still effective by designing a novel variant of backdoor that survives the customization process. The resultant attack is more stealthy and hard to defend. Third, I study the feasibility of applying current trigger design in real world environments. Existing backdoor attacks are mostly studied in the digital environment where triggers are merely pixel modifications on images. But to attack real world systems that are deployed in the physical world, attackers cannot add digital triggers since it would require access to edit inputs digitally. In this case, attackers have to use physical objects that already exist in the physical world as triggers. Therefore I conduct a systematic study to understand how well backdoor attacks can be performed using physical object triggers and what limitations attackers have to face in executing them. Finally, I summarize my work in backdoor attacks and provide insights on this area. I hope my work can motivate more studies of backdoor attacks and defenses under real world scenarios.