Files
Abstract
Recent advances in word embedding and digitized text have reinvigorated macro-historical cultural analysis based on empirical data. However, static word2vec embedding, a distance-based measurement of latent semantic structures, becomes less interpretable when moved beyond survey-validated categories. Often, there is a one-to-many mapping between low-level semantic distance and high-level meanings, obscuring the pragmatic language use that drive latent semantic shifts in the embedding space. Focusing on “unintelligence” words in 20th-century American discourse, I critique the stereotype-based interpretation of their changing gender alignment in historical word2vec embeddings of American literature. While the gender axis reliably measures word associations, low-level associations do not neatly map onto well-defined gender stereotypes. Applying an adapted A La Carte approximation of individual context words and sentences, I suggest another possible interpretation: that the masculinization of “unintelligence” words could have stemmed from the increasing harshness in tone (a spurious variable), not judgments of male intelligence or stupidity. This study highlights the interpretive gap in embedding-based methods and calls for more rigorous approaches to making intersubjectively valid interpretations on cultural change.