Files
Abstract
Foundation models, such as large language models (LLMs), operate within vector spaces, whereas human perception of concepts does not naturally align with this framework. This raises a fundamental question: how do these models internalize the structure of concepts within a vector space and how do they use it? To address this, the thesis investigates structural properties such as linearity and partial orthogonality and also studies how models can leverage structures in representations to combine and extract information. The first part analyzes linear representations. While the concept of linearity appears straightforward, its underlying basis—especially in large language models trained solely on next-token prediction—remains a mostly unresolved mystery. This thesis provides new insights into this phenomenon by showing the connection between linear representations and the implicit bias of gradient descent. The second part examines how models represent the intuitive notion of ``semantic independence." Rather than formally defining semantic independence, the focus is on the algebraic axioms of independence and how they can be represented in the forms of partial orthogonality in the representation space. Finally, the third part studies representations in a practical setting—fact retrieval—and explores how self-attention can effectively combine stored information in representations to retrieve the most relevant outputs, functioning like associative memory.