A Robust Interpolation Scheme in Diffusion Models for Single-Channel Speech Enhancement Audio Samples

Authors

Zilu Guo, Qing Wang, Jun Du, Shutong Niu, Ruoyu Wang, Gaobin Yang, Jia Pan, Qing-Feng Liu and Chin-Hui Lee

Abstract

In this article, we introduce a variance-preserving interpolation approach to diffusion models for single-channel speech enhancement (SE), termed the variance-preserving interpolation diffusion model (VPIDM). Specifically, VPIDM operates with only 25 iterative steps and obviates the need for the corrector, an essential element in the existing variance-exploding interpolation diffusion model (VEIDM). Two notable distinctions between VPIDM and VEIDM are the scaling function of the mean of a state variable and the constraint imposed on the variance amplitude relative to the mean's scale. We conduct a systematic exploration of the theoretical framework underlying VPIDM, further delve into comprehensive insights regarding VPIDM's application in SE, and introduce an early-stopping algorithm specifically designed for automatic speech recognition (ASR). Our empirical research, evaluated on two distinct datasets, demonstrates VPIDM's superior fidelity over traditional discriminative SE approaches. Furthermore, we assess the performance of the proposed model under varying signal-to-noise ratio (SNR) conditions. The investigation reveals VPIDM's improved robustness in target noise elimination compared to VEIDM. Significantly, implementing the early-stopping algorithm on both VPIDM and VEIDM results in enhanced ASR accuracy, thereby highlighting the practical efficacy of our methodology.

Source Code

Our codes are presented here

Audio Samples


DNS Simulated dataset -5 dB

Models clnsp51 clnsp67 clnsp94 clnsp176 clnsp513
Clean
Noisy
FSubNet
NSNet2
NCSN++
VEIDM
VPIDM

DNS Simulated dataset 0 dB

Models clnsp51 clnsp67 clnsp94 clnsp176 clnsp513
Clean
Noisy
FSubNet
NSNet2
NCSN++
VEIDM
VPIDM

DNS Simulated dataset 5 dB

Models clnsp51 clnsp67 clnsp94 clnsp176 clnsp513
Clean
Noisy
FSubNet
NSNet2
NCSN++
VEIDM
VPIDM