Files
Abstract
This study explores the socio-technical challenges of moderating online toxic discourse with large language models, specifically ChatGPT. Situated at the intersection of social science and computational linguistics, this study critically assesses ChatGPT’s capacity to identify and mitigate toxic contents. It interrogates the efficacy of ChatGPT in navigating the nuanced terrain of online communication, where linguistic toxicity patterns are both overt and subtle. By answering the two main research questions: 1. How effective is ChatGPT in identifying toxic content in online comments, and how does its performance compare to other models, such as the OpenAI moderation model? 2. To what extent can ChatGPT detoxify toxic comments, and what is the trade-off between toxicity reduction and preservation of the original content’s meaning? The research underscores the complexity of automating content moderation, revealing instances where ChatGPT’s algorithmic judgments fail to align with human perceptions of toxicity. This misalignment raises questions about the social implications of relying on AI for discourse regulation. Moreover, the study advocates for a hybrid model of moderation that integrates machine precision with nuanced human understanding, emphasizing the importance of ethical considerations in AI deployment. By examining the limitations and potential of ChatGPT, this thesis contributes to the broader discourse on the role of technology in shaping online communities and the ethical dimensions of AI-mediated communication.