Files

Abstract

While most theories regarding the various aspects of human language are couched in the language of discrete mathematics, modern approaches to natural language processing (NLP) generally rely on continuous representations of linguistic units, and all but eschew linguistic theorizing. An emerging consensus on why these methods have been so successful in practical applications is that the flexibility of modern neural network techniques allows for the encoding of discrete linguistic structures in these models' continuous embedding spaces. This dissertation takes this view, and explores the continuous space representations of BERT (Devlin et al., 2019), a recent model which has shown to be particularly successful in NLP applications. In experiments collected in three chapters, this dissertation uses multiple machine learning techniques to explore BERT's continuous space in search of evidence of assorted morpho-syntactic features, different aspects of lexical semantics (e.g. sentiment), as well as morphophonology and pragmatics. In all cases, the experiments provide evidence that the relevant linguistic information is encoded in BERT's embedding space. Results of supervised classifiers provide strong evidence that BERT encodes words in a manner highly predictive of inflectional feature values. Further experiments show that it is possible to factor BERT's word embeddings via a generative neural network where the latent variables are inspired by generative linguistic theorizing. The result is so-called disentangled representations, and is a step towards more interpretable representations for NLP. Finally, assorted data-visualization techniques demonstrate that BERT's embedding space can be partitioned in a way which is highly predictive of both morphophonological and pragmatic features. In addition to contributing to the literature on interpretability of neural network models for NLP, this dissertation hopes to shed light on what aspects of language can be learnt by models like BERT, with no in-built linguistic structure and no supervision.

Details

Actions

Preview

Downloads Statistics

from
to
Download Full History