LOW-LATENCY SPEECH ENHANCEMENT VIA SPEECH TOKEN GENERATION

This website is to show some demos of our research work 'LOW-LATENCY SPEECH ENHANCEMENT VIA SPEECH TOKEN GENERATION' submitted to ICASSP 2024.

Comparison with traditional data-driven approach

The proposed method removes more noise on the background than traditional data-driven approach TFNet.

Noisy	TFNet	Proposed	Target

The prefix-based approach is more prone to get errored and fails to predict right content out of the context learned.

Type	Noisy	Prefix-based	Proposed
Normal case(11s)
Long case(15s)