A new AI model reconstructs speech by analyzing subtle vibrations in the throat while incorporating context such as time and emotional state to better capture a person’s intended message.
The innovation targets dysarthria, a condition where neurological impairments affect fine motor control of the voice box, jaw, or tongue. Unlike brain-computer interfaces, which require invasive surgery, this approach uses textile strain sensors to measure throat muscle movements and carotid artery pulses.
Researchers from the University of Cambridge, University College London, and Beihang University collaborated on the project.
Data from these sensors feed into two AI models. The first, called the token synthesis agent (TSA), deciphers the words a user attempts to say and organizes them into sentences.
The second model, the sentence expansion agent, enriches these sentences by factoring in contextual and emotional cues—such as whether the user feels neutral, relieved, or frustrated—creating personalized expressions that align more closely with the user’s intent.
In a trial with five individuals with dysarthria caused by stroke, the system achieved sentence error rates as low as 2.9% and increased user satisfaction by 55% compared to simpler sentence reconstruction methods.
While the potential benefits are substantial, experts note some risks. Russell Beale from the University of Birmingham points out that misalignment between the AI’s reconstructed language and the user’s natural speech patterns could be frustrating.
“If someone was highly articulate before, and the AI uses simpler language, that might feel limiting—but it’s still far better than being unable to communicate,” he says.
Beale suggests the technology could be customized to reflect users’ unique speech patterns, creating a more authentic voice. “The beauty here is that users don’t need to adapt,” he adds. “They just do what they’ve always done, and the technology bridges the gap.”