利用監督式對比學習來建構增強型的自迴歸文件檢索器

Yi-Cheng Wang; Tzu-Ting Yang; Hsin-Wei Wang; Yung-Chang Hsu; Berlin Chen

月旦知識庫會員登入｜元照網路書店｜月旦品評家

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	利用監督式對比學習來建構增強型的自迴歸文件檢索器
並列篇名	Building an Enhanced Autoregressive Document Retriever Leveraging Supervised Contrastive Learning
作者	Yi-Cheng Wang (Yi-Cheng Wang)、Tzu-Ting Yang (Tzu-Ting Yang)、Hsin-Wei Wang (Hsin-Wei Wang)、Yung-Chang Hsu (Yung-Chang Hsu)、Berlin Chen (Berlin Chen)
中文摘要	資訊檢索系統的目標是從大量的文件中，找出與使用者查詢(Query)最相關的文件。在傳統的資訊檢索流程中，需要經過多次的比對許多文件才能找出最相關的文件。近期，有一種基於可微搜索索引(Differentiable Search Index, DSI)的新穎資訊檢索策略被提出，並展現相當優異的效能。DSI透過單一個Transformer模型先將文件集中所有的資訊編碼在模型的參數中；在應用時，使用者可以將查詢輸入Transformer，再由Transformer以自迴歸的方式直接地產生其相關文件的編號(Document IDs)，因而能大幅地簡化與加速整個檢索過程。先前的研究指出，DSI是以文件編號作為橋梁來建立查詢與文件之間的關係，但在訓練資料中並不是每篇文件都會有相關的查詢，這將導致這些文件沒辦法被順利的建立起關係。有鑑於此，在模型訓練階段，我們提出先使用監督式對比學習來增強查詢與文件在潛在語意空間中的對應關係，並在模型推理階段時，透過最鄰近搜尋法來進一步的輔助模型產生文件編號。因此，我們提出的方法能有效增強DSI中文件與查詢薄弱的對應關係，在公開的語料集Nature Question上也驗證了它的成效。
英文摘要	The goal of an information retrieval system is to retrieve documents that are most relevant to a given user query from a huge collection of documents, which usually requires time-consuming multiple comparisons between the query and candidate documents so as to find the most relevant ones. Recently, a novel retrieval modeling approach, dubbed Differentiable Search Index (DSI), has been proposed. DSI dramatically simplifies the whole retrieval process by encoding all information about the document collection into the parameter space of a single Transformer model, on top of which DSI can in turn generate the relevant document identities (IDs) in an autoregressive manner in response to a user query. Although DSI addresses the shortcomings of traditional retrieval systems, previous studies have pointed out that DSI might fail to retrieve relevant documents because DSI uses the document IDs as the pivotal mechanism to establish the relationship between queries and documents, whereas not every document in the document collection has its corresponding relevant and irrelevant queries for the training purpose. In view of this, we put forward to leveraging supervised contrastive learning to better render the relationship between queries and documents in the latent semantic space. Furthermore, an approximate nearest neighbor search strategy is employed at retrieval time to further assist the Transformer model in generating document IDs relevant to a posed query more efficiently. A series of experiments conducted on the Nature Question benchmark dataset confirm the effectiveness and practical feasibility of our approach in relation to some strong baseline systems.
起訖頁	273-282
關鍵詞	資訊檢索、自迴歸檢索系統、對比學習、Information Retrieval、Autoregressive Retrieval System、Contrastive Learning
刊名	ROCLING論文集
期數	202212 (2022期)
出版單位	中華民國計算語言學學會
該期刊-上一篇	Image Caption Generation for Low-Resource Assamese Language
該期刊-下一篇	A Quantitative Analysis of Comparison of Emoji Sentiment: Taiwan Mandarin Users and English Users

新書閱讀

元照讀書館

優惠活動

月旦品評家

元照讀書館

．研討會新訊

月旦知識庫

月旦法律分析庫
月旦醫事法網
月旦會計財稅網

期刊數位服務

社群平台

讀者服務

關於元照

讀者服務專線：+886-2-23756688　傳真：+886-2-23318496
地址：臺北市館前路28 號 7 樓　客服信箱