英文摘要 |
In this research work, we build a speaker recognition system based on the x-vector framework for speaker verification. During training, we propose to use the triplet loss to increase the distance between the embedding vectors from different speakers in high-dimensional space. During recognition, we use the European distance between test-utterance embedding vector and enrolled-speaker embedding vector for similarity measure, thus predicting the enrolled speaker with the minimum distance. The proposed system is evaluated with VoxCeleb speaker recognition dataset. The test set consists of utterances from 1,251 test speakers. The proposed model achieves the top-1 recognition accuracy of 59.57% and the top-5 accuracy of 80.32%. |