LOW-LATENCY SPEECH ENHANCEMENT VIA SPEECH TOKEN GENERATION
This website is to show some demos of our research work 'LOW-LATENCY SPEECH ENHANCEMENT VIA SPEECH TOKEN GENERATION' submitted to
ICASSP 2024
.
Comparison with traditional data-driven approach
The proposed method removes more noise on the background than traditional data-driven approach TFNet.
Synthetic test set
Noisy
TFNet
Proposed
Target
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Real-recording test set
Noisy
TFNet
Proposed
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Comparison with prefix-based approach
The prefix-based approach is more prone to get errored and fails to predict right content out of the context learned.
Type
Noisy
Prefix-based
Proposed
Normal case(11s)
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Long case(15s)
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.