| 英文摘要 |
Building on recent successes in automatic speech recognition (ASR), the next big research challenge will be multilingual ASR (MASR) capable of exceeding human performance. It is always believed that the MASR problem is too big to address for researchers knowing only a few languages or any research groups with limited resources. Language-specific acoustic modeling has always been a practical approach to designing high performance ASR systems for a particular language. However for data-limited languages the system accuracy is usually poor. Extending to MASR a popular technique is to group together all training speech data from all the available languages, find a set of fundamental phone units that cover all the languages, and train a set of universal phone models (UPMs) that can be used to characterize all the phones and triphones for all the languages being considered. Language-adaptive models have recently been shown to improve over language-specific models in some special situations. This common set of phones is usually derived from the collection of International Phonetic Alphabet (IPA) which was mainly defined phonetically, and was shown in previous studies to give non-satisfactory MASR performance because of the inconsistency and a lack of full knowledge in defining the IPA. Due to our recent success in modeling and detecting speech attributes across multiple languages it seem reasonable to explore these fundamental units as shared structures spanning over all spoken languages that can be used for large vocabulary MASR of all the languages seen or unseen during training. |