This paper proposes a multi-scene sentiment analysis model for Chinese speech and text based on CNN-BiGRU-CTC + ERNIE-BiLSTM. The model is applied to the intelligent customer service scenario. While conducting voice interaction, intelligent customer service can obtain the user’s current emotion, to give a more humane answer and improve the user experience. All the training data sets in this paper adopted public data sets such as Aishell-1 and NLPCC 2014, etc.We have been able to achieve a testing accuracy of about 94.5%. The accuracy is improved by 5.24% compared to the latest speech sentiment analysis model that uses audio as a feature. The advantage of this paper is that it adopts the ERNIE language pre-training model to conduct sentiment analysis on speech signals, which still has a good classification accuracy in the case of individual wrong words in speech recognition.