Logo
  • Fast FullSubNet: Accelerate Full-band and Sub-band Fusion Model for Single-channel Speech Enhancement (Under Review)

    Xiang Hao
    ,
    Xiaofei Li

    Submitted to ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023)
    FullSubNet is our recently proposed real-time single-channel speech enhancement network that achieves outstanding performance on the Deep Noise Suppression (DNS) Challenge dataset. A number of variants of FullSubNet have been proposed recently, but they all focus on the structure design towards better performance and are rarely concerned with computational efficiency. This work proposes a new architecture named Fast FullSubNet dedicated to accelerating the computation of FullSubNet. Specifically, Fast FullSubNet processes sub-band speech spectra in the mel-frequency domain by using cascaded linear-to-mel full-band, sub-band, and mel-to-linear full-band models such that frequencies involved in the sub-band computation are vastly reduced. After that, a down-sampling operation is proposed for the sub-band input sequence to further reduce the computational complexity along the time axis. Experimental results show that, compared to FullSubNet, Fast FullSubNet has only 13% computational complexity and 16% processing time, and achieves comparable or even better performance.
    logo
  • FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement

    Xiang Hao
    ,
    Xiangdong Su
    ,
    Radu Horaud
    ,
    Xiaofei Li

    ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
    This paper proposes a full-band and sub-band fusion model, named as FullSubNet, for single-channel real-time speech enhancement. Full-band and sub-band refer to the models that input full-band and sub-band noisy spectral feature, output full-band and sub-band speech target, respectively. The sub-band model processes each frequency independently. Its input consists of one frequency and several context frequencies. The output is the prediction of the clean speech target for the corresponding frequency. These two types of models have distinct characteristics. The full-band model can capture the global spectral context and the long-distance cross-band dependencies. However, it lacks the ability to modeling signal stationarity and attending the local spectral pattern. The sub-band model is just the opposite. In our proposed FullSubNet, we connect a pure full-band model and a pure sub-band model sequentially and use practical joint training to integrate these two types of models' advantages. We conducted experiments on the DNS challenge (INTERSPEECH 2020) dataset to evaluate the proposed method. Experimental results show that full-band and sub-band information are complementary, and the FullSubNet can effectively integrate them. Besides, the performance of the FullSubNet also exceeds that of the top-ranked methods in the DNS Challenge (INTERSPEECH 2020).
    logo
  • Sub-Band Knowledge Distillation Framework for Speech Enhancement

    Xiang Hao
    ,
    Shixue Wen
    ,
    Xiangdong Su
    ,
    Yun Liu
    ,
    Guanglai Gao
    ,
    Xiaofei Li

    INTERSPEECH 2020 - The 21th Annual Conference of the International Speech Communication Association
    In single-channel speech enhancement, methods based on full-band spectral features have been widely studying, while only a few methods pay attention to non-full-band spectral features. In this paper, we explore a knowledge distillation framework based on sub-band spectral mapping for single-channel speech enhancement. First, we divide the full frequency band into multiple sub-bands and pre-train elite-level sub-band enhancement model (teacher model) for each sub-band. The teacher models are dedicated to processing their own sub-bands. Next, under the teacher models’ guidance, we train a general sub-band enhancement model (student model) that works for all sub-bands. Without increasing the number of model parameters and computational complexity, the student model’s performance is further improved. To evaluate the proposed method, we conducted a large number of experiments on an open-source data set. The final experimental results show that the guidance from the elite-level teacher models dramatically improves the student model’s performance, which exceeds the full-band model by employing fewer parameters.
    logo