中文摘要 |
This paper describes the collection and processing of a pilot speech corpus annotated in dialogue acts. The Mandarin Topic-oriented Conversational Corpus (MTCC) consists of annotated transcripts and sound files of conversations between two familiar persons. Particular features of spoken Mandarin, such as discourse particles and paralinguistic sounds, are taken into account in the orthographical transcription. In addition, the dialogue structure is annotated using an annotation scheme developed for topic-specific conversations. Using the annotated materials, we present the results of a preliminary analysis of dialogue structure and dialogue acts. Related transcription tools and web query applications are also introduced in this paper. |