月旦知識庫
 
  1. 熱門:
 
首頁 臺灣期刊   法律   公行政治   醫事相關   財經   社會學   教育   其他 大陸期刊   核心   重要期刊 DOI文章
中文計算語言學期刊 本站僅提供期刊文獻檢索。
  【月旦知識庫】是否收錄該篇全文,敬請【登入】查詢為準。
最新【購點活動】


篇名
Aligning Sentences in a Paragraph-Paraphrased Corpus with New Embedding-based Similarity Measures
並列篇名
Aligning Sentences in a Paragraph-Paraphrased Corpus with New Embedding-based Similarity Measures
作者 Aleksandra Smolka (Aleksandra Smolka)Hsin-Min Wang (Hsin-Min Wang)Jason S. Chang (Jason S. Chang)Keh-Yih Su (Keh-Yih Su)
英文摘要
To better understand and utilize lexical and syntactic mapping between various language expressions, it is often first necessary to perform sentence alignment on the provided data. Up until now, the character trigram overlapping ratio was considered to be the best similarity measure on the text simplification corpus. In this paper, we aim to show that a newer embedding-based similarity metric will be preferable to the traditional SOTA metric on the paragraph-paraphrased corpus. We report a series of experiments designed to compare different alignment search strategies as well as various embedding- and non-embedding-based sentence similarity metrics in the paraphrased sentence alignment task. Additionally, we explore the problem of aligning and extracting sentences with imposed restrictions, such as controlling sentence complexity. For evaluation, we use paragraph pairs sampled from the Webis-CPC-11 corpus containing paraphrased paragraphs. Our results indicate that modern embedding-based metrics such as those utilizing SentenceBERT or BERTScore significantly outperform the character trigram overlapping ratio in the sentence alignment task in the paragraph-paraphrased corpus.
起訖頁 1-29
關鍵詞 Sentence AlignmentSentence SimilaritySentence EmbeddingParagraph-paraphrased Corpus
刊名 中文計算語言學期刊  
期數 202212 (27:2期)
出版單位 中華民國計算語言學學會
該期刊-下一篇 探討語者驗證系統中特徵處理模組與注意力機制
 

新書閱讀



最新影音


優惠活動




讀者服務專線:+886-2-23756688 傳真:+886-2-23318496
地址:臺北市館前路28 號 7 樓 客服信箱
Copyright © 元照出版 All rights reserved. 版權所有,禁止轉貼節錄