| 英文摘要 |
This study focuses on speaker separation, investigating how to successfully separate mixed multiple speech signals. We build upon the efficient end-to-end speech separation model SuDoRM-RF and integrate the Residual Conformer Block from the MANNER model along with the Multi-view Attention block to create the efficient speech separation model ESC MA-SD Net. The Residual Conformer Block in this model eliminates irrelevant information while preserving crucial speech details. The Multi-view Attention module is employed to capture diverse aspects of speech features. By doing so, we achieve a more efficient speech separation model, ESC MA-SD Net, compared to the original SuDoRM-RF model. In our experiments, we demonstrate the effectiveness of the proposed method using validation data and spectrograms to showcase the improved speech separation performance. |