Go to main content
Formats
Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS
Cite
Citation

Files

Abstract

The development of explainable artificial intelligence (AI) has become a crucial step in enhancing model transparency, robustness, and usability. This thesis explores the explainability in the context of natural language processing, proposing novel approaches to improve model performance and support scientific discovery. We investigate whether explainable AI techniques can be leveraged to improve out-of-distribution generalization and model decision-making. By incorporating natural language explanations and rationale-based models, we aim to address challenges in model interpretability and resilience, particularly in the face of adversarial attacks and misleading inputs. Additionally, we propose algorithms that leverage large language models to generate novel and robust scientific hypotheses in the social science domain. We further propose using mechanistic interpretability to understand what models have learned, particularly in scenarios where they exhibit superhuman performance, thereby providing insights into the internal workings of these models and aiding the generation of novel scientific hypotheses. This research contributes to advancing both the theoretical understanding of AI systems and their practical application in fields requiring high levels of reliability and transparency, such as scientific research and critical decision-making.

Details

from
to
Export
Download Full History