Go to main content
Formats
Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS
Cite

Files

Abstract

With the rapid emergence of large language model (LLM) based text-to-speech technologies, Artificial Intelligence (AI) is increasingly capable of generating coherent and natural-sounding speech. However, a gap remains in understanding how listeners perceive structured, multi-agent AI-generated conversations compared to authentic human interactions. This study investigates the perceptual differences between human-human and AI-generated dialogues within the context of podcast excerpts. Utilizing a within-subjects experimental design, 70 participants evaluated standardized audio clips of both human podcasts and AI-generated simulations (created via Gemini Notebook LLM) across four key dimensions: Linguistic Diversity (LD), Tone and Engagement (TE), Paralinguistic Variables (PV), and Behavioral Dynamics (BD).The results demonstrate statistically significant differences favoring human conversations in Linguistic Diversity (p=0.0079), Tone and Engagement (p=0.0067), and Behavioral Dynamics (p=0.0166), indicating that listeners still perceive human dialogue as more linguistically structured, emotionally engaging, and fluid. Notably, no significant difference was found in Paralinguistic Variables (p=0.0927), suggesting that AI has successfully advanced in simulating surface-level acoustic features such as pauses and filler words. Preference data further corroborated these findings, with 60% of participants preferring human hosts for real-life listening. These findings highlight that while AI-generated speech has achieved high paralinguistic realism, it still lacks the emotional depth and interactional nuance of human communication. The study offers theoretical and practical implications for Human-Computer Interaction (HCI) and the ethical development of synthetic media.

Details

from
to
Export
Download Full History