Mind the Gaps: How Learners Parse Reductions in Chatbot Dialogue
Main Article Content
Abstract
Conversational chatbots now rely on synthetic voices that mimic natural reductions such as elision, assimilation, catenation, flapping, and vowel centralization. These features enhance fluency but often blur meaning, particularly for second-language learners. This study presents a computational analysis—conducted without human participants—examining how varying degrees of reduction and speech rate in text-to-speech (TTS) output influence comprehension within an AI dialogue pipeline (TTS → ASR → LLM). A purpose-built corpus of short task dialogues was generated and rendered with multiple TTS voices under three reduction and rate settings. Comprehension was evaluated using word-error rate, entity-recognition F1, and dialogue-level question-answer accuracy, while acoustic–prosodic measures yielded a Reduction Index capturing duration shortening, vowel centralization, and boundary cues. A single AI clarification turn—explicitly reformulating reduced forms—was tested as a recovery strategy. Moderate reductions maintained comprehension in prosody-rich voices, but extreme and fast reductions caused error spikes and semantic drift. One clarification restored much of the loss and improved stability across new scripts and voices. Vowel centralization and syllable compression predicted most failures. The resulting Reduction–Robustness Curve provides a benchmark for balancing naturalness and intelligibility in synthetic speech and for designing adaptive clarification in AI-based language tutoring.