節錄式語音文件摘要使用表示法學習技術

施凱文; 陳冠宇; 劉士弘; 王新民; 陳柏琳

月旦知識庫會員登入｜元照網路書店｜月旦品評家

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	節錄式語音文件摘要使用表示法學習技術
並列篇名	Extractive Spoken Document Summarization with Representation Learning Techniques
作者	施凱文、陳冠宇 (Guan-Yu Chen)、劉士弘、王新民、陳柏琳
中文摘要	大量多媒體內容的與日俱增促使自動語音文件摘要成為一項重要的研究議題。其中最為廣泛地被探究的是節錄式語音文件摘要(Extractive Spoken Document Summarization)，其目的是根據事先定義的摘要比例，從語音文件中選取一些重要的語句，用以代表原始語音文件的主旨或主題。另一方面，表示法學習(Representation Learning)是近期相當熱門的一個研究議題，多數的研究成果也證明了這項技術在許多自然語言處理(Natural Language Processing, NLP)的相關任務上，可以進一步地獲得優良的成效。有鑑於此，本論文主要探討使用詞表示法(Word Representations)及語句表示法(Sentence Representations)於節錄式中文廣播新聞語音文件摘要之應用。基於詞表示法及語句表示法，本論文提出三種新穎且有效的排序模型(Ranking Models)。除了文件中的文字資訊外，本論文更進一步地結合語音文件上的各式聲學特徵，如韻律特徵(Prosodic Features)等，期望可以獲得更好的摘要成效。本論文的語音文件摘要實驗語料是採用公視廣播新聞；實驗結果顯示，相較於其它現有的摘要方法，我們所發展的新穎式摘要方法能夠提供顯著的效能改善。
英文摘要	The rapidly increasing availability of multimedia associated with spoken documents on the Internet has prompted automatic spoken document summarization to be an important research subject. Thus far, the majority of existing work has focused on extractive spoken document summarization, which selects salient sentences from an original spoken document according to a target summarization ratio and concatenates them to form a summary concisely, in order to convey the most important theme of the document. On the other hand, there has been a surge of interest in developing representation learning techniques for a wide variety of natural language processing (NLP)-related tasks. However, to our knowledge, they are largely unexplored in the context of extractive spoken document summarization. With the above background, this study explores a novel use of both word and sentence representation techniques for extractive spoken document summarization. In addition, three variants of sentence ranking models building on top of such representation techniques are proposed. Furthermore, extra information cues like the prosodic features extracted from spoken documents, apart from the lexical features, are also employed for boosting the summarization performance. A series of experiments conducted on the MATBN broadcast news corpus indeed reveal the performance merits of our proposed summarization methods in relation to several state-of-the-art baselines.
起訖頁	65-85
關鍵詞	語音文件、節錄式摘要、詞表示法、語句表示法、韻律特徵、Spoken Document、Extractive Summarization、Word Representation、Sentence Representation、Prosodic Feature
刊名	中文計算語言學期刊
期數	201512 (20:2期)
出版單位	中華民國計算語言學學會
該期刊-上一篇	Word Co-occurrence Augmented Topic Model in Short Text
該期刊-下一篇	調變頻分解技術於強健語音辨識之研究

新書閱讀

元照讀書館

優惠活動

月旦品評家

元照讀書館

．研討會新訊

月旦知識庫

月旦法律分析庫
月旦醫事法網
月旦會計財稅網

期刊數位服務

社群平台

讀者服務

關於元照

讀者服務專線：+886-2-23756688　傳真：+886-2-23318496
地址：臺北市館前路28 號 7 樓　客服信箱