New AI-driven Target Speech Hearing System Allows Users to Focus on Specific Voices in Noisy Environments
Modern life is filled with noise, making it challenging to focus on specific voices in crowded or bustling environments. While noise-canceling headphones can help reduce background sounds, they often block out all sounds, including those you want to hear. However, a new AI-driven prototype system, known as Target Speech Hearing, is aiming to change this.
The innovative Target Speech Hearing system allows users to select a particular person’s voice that will remain audible even when other sounds are canceled out. This technology, although currently in the proof-of-concept stage, is being considered for integration into popular noise-canceling earbuds and potentially hearing aids in the future.
Shyam Gollakota, a professor at the University of Washington involved in the project, highlights the importance of being able to focus on specific individuals in noisy environments. He emphasizes that this capability is fundamental to human communication and interaction.
Previously, the researchers successfully trained a neural network to identify and filter out specific sounds like crying babies or ringing alarms. However, isolating human voices posed a more significant challenge due to the complexity involved. This necessitated the development of more intricate neural networks.
Given the limitations of real-time processing in devices like headphones with restricted computing power and battery life, the researchers employed an AI compression technique called knowledge distillation. This technique involved training a smaller model (the ‘student’) to mimic the behavior of a larger AI model (the ‘teacher’) that had been trained on millions of voices.
The ‘student’ model was then trained to extract vocal patterns of specific voices from the surrounding noise picked up by microphones integrated into commercially available noise-canceling headphones. To activate the Target Speech Hearing system, the wearer simply holds down a button on the headphones for a few seconds while facing the intended person. This ‘enrollment’ process captures an audio sample from both headphones, enabling the system to focus on the selected voice.