LOW-LATENCY SPEECH ENHANCEMENT VIA SPEECH TOKEN GENERATION

This website is to show some demos of our research work 'LOW-LATENCY SPEECH ENHANCEMENT VIA SPEECH TOKEN GENERATION' submitted to ICASSP 2024.

Comparison with traditional data-driven approach

The proposed method removes more noise on the background than traditional data-driven approach TFNet.

Synthetic test set

Noisy TFNet Proposed Target

Real-recording test set

Noisy TFNet Proposed

Comparison with prefix-based approach

The prefix-based approach is more prone to get errored and fails to predict right content out of the context learned.
Type Noisy Prefix-based Proposed
Normal case(11s)
Long case(15s)