英文摘要 |
This paper proposes a multi-speaker talking-face synthesis system. The system incorporates voice cloning and lipsyncing technology to achieve text-totalking- face generation by acquiring audio and video clips of any speaker and using zero-shot transfer learning. In addition, we used open-source corpora to train several Taiwanese-accented models and proposed using Mandarin Phonetic Symbols (Bopomofo) as the character embedding of the synthesizer to improve the system’s ability to synthesize Chinese-English codeswitched sentences. Through our system, users can create rich applications. Also, the research on this technology is novel in the audiovisual speech synthesis field. |