NYU researchers develop neural decoding that can give back lost speech

Diagram of the human head and the brain with signal readings

Losing the ability to speak due to neurological damage can be incredibly isolating. But thanks to recent advancements in technology, there's hope on the horizon. Scientists have been working on neural speech prostheses, special devices that can help people who have trouble speaking by translating brain activity into speech.

In a recent study published in Nature Machine Intelligence, a team of NYU researchers led by Yao Wang — Professor of Electrical and Computer Engineering and Biomedical Engineering at NYU Tandon, as well as a member of NYU WIRELESS — and Adeen Flinker —  Associate Professor of Biomedical Engineering at NYU Tandon and Neurology at NYU Grossman School of Medicine — and Tandon ECE Ph.D. student Xupeng Chen presented a significant advancement in the decoding of speech using neural architectures — turning signals recorded from the brain and transforming them into audible speech. Building upon previous research, their work introduces modifications that enhance decoding accuracy across a broader range of voices. 

One key innovation lies in the adaptation of neural architectures to accommodate diverse speech patterns. Recent strides in machine learning and Brain-Computer Interface (BCI) systems have propelled the development of neural speech prostheses, offering hope to those affected by speech impairments. 

One effective method for gathering data to develop such prostheses involves Electrocorticographic (ECoG) recordings obtained from epilepsy surgery patients. Implanted electrodes provide a rare opportunity to collect cortical data during speech with high precision, leading to promising results in speech decoding. Previously validated on five patients, their updated approach now has been validated over 48 individuals — an order of magnitude larger than in other similar work, ensuring a more robust and generalized decoding process.

Two significant challenges persist in decoding speech from neural signals. Firstly, the limited duration of training data contrasts with the extensive data required for deep learning models. Secondly, speech production variability, encompassing rate, intonation, and pitch variations, complicates model representation. 

The NYU team’s approach uses a unique speech synthesizer developed in their previous research. This synthesizer translates a series of interpretable speech “parameters” including pitch,  frequency, loudness, etc., into natural sounding speech. The developed system leverages neural network architectures to decode neural signals into speech parameters which  the synthesizer uses to produce the intended speech.  

The team developed an efficient neural network training pipeline that works effectively with limited training data, and compared the efficacy of different neural network architectures. The system can produce speech that is much closer to the actual voice of the study participants (mp4 file download) — a unique feature of this approach. 

Perhaps most intriguing is the discovery regarding the right hemisphere's contribution to speech decoding. Traditionally the right hemisphere is overshadowed by the left hemisphere, which is predominantly associated with language functions.  However, some of the participants only had electrodes implanted on their right hemisphere, providing the researchers no information about the left-hemisphere’s activities. Crucially, they were still able to use the information from the right hemisphere to produce accurate speech decoding. Not only does this reveal how speech is processed and produced by the brain across the two hemispheres, it also  opens new possibilities for therapeutic interventions, particularly in addressing speech disorders like aphasia, following  damage to the left hemisphere.

In addition to its scientific findings, the study offers an open-source neuro-decoding pipeline, facilitating collaboration and replication of results within the research community. This initiative promotes transparency and accelerates progress in the field of neural decoding.

The research’s implications for understanding the complexities of speech processing and potential therapeutic avenues loom large, and marks a significant milestone in unraveling the mysteries of the human mind's linguistic capabilities, paving the way for future breakthroughs in neuroengineering and clinical interventions. The NYU team is continuing to investigate neural decoding approaches under a new NSF award and ongoing NIH grants


This work was supported by National Science Foundation under its Collaborative Research in Computational Neuroscience Program as well as the NIH National Institute of Neurological Disorders and Stroke