| 英文摘要 |
Speaker recognition has evolved over nearly five decades, with speech standing out as the most intuitive mode of communica¬tion. The i-vector has long held its position as the pinnacle of technology in speaker verification. However, this proposed work in¬troduces deep learning technology with the aim of surpassing the established i-vector in speaker verification applications. Numerous techniques have been explored in prior research to enhance speaker accuracy, but the integration of deep learning techniques marks a significant and revolutionary shift. This research aims to establish an automated deep-learning framework specifically designed to enhance the discriminative power of speaker verification representations. We conducted various experiments on the VoxCeleb-1 database to assess the performance of different deep learning methods, including the use of multiple activation functions and opti¬mizers. These experiments were designed to evaluate the effectiveness of the algorithms, and we validated our proposed system’s performance using benchmark dataset tests. Our system achieved its highest success rate by utilizing the Relu activation function, employing Stochastic gradient descent (SGD) as the optimizer, and incorporating a second layer. This resulted in a notable decrease in the Equal Error Rate (EER) from 17.6 to 9.93, representing an approximate 50% improvement in accuracy on the benchmark tests. These results clearly indicate that our automated model surpasses existing literature in this area. We anticipate that our pro¬posed model will be a valuable asset for researchers and the academic community, facilitating further exploration and advancement in this field. |