月旦知識庫
月旦知識庫 會員登入元照網路書店月旦品評家
 
 
  1. 熱門:
首頁 臺灣期刊   法律   公行政治   醫事相關   財經   社會學   教育   其他 大陸期刊   核心   重要期刊 DOI文章
圖書資訊學刊 本站僅提供期刊文獻檢索。
  【月旦知識庫】是否收錄該篇全文,敬請【登入】查詢為準。
最新【購點活動】


篇名
應用類別對照之資料融合方法於機器學習研究分類
並列篇名
Exploring Class Mapping as Data Fusion Technique in Machine Learning for Research Classification
中文摘要
訓練機器學習分類模型須充足且高品質的資料,本研究探討類別對照作為資料融合策略,發展研究分類之機器學習模型,以2008年版與2020年版之澳洲與紐西蘭標準研究分類表為研究標的,從8家機構典藏系統蒐集179,431筆已分類文件,對二版本分類表分別建立原始資料集,及以類別對照方式擴增之資料集。結果顯示49%的2008年版文件可明確對應至2020年版,反之則為63%。進一步以SVM、SciBERT、ModernBERT-base與ModernBERT-large建立分類模型,相較僅採用原始資料集,各模型經擴增資料集訓練後,分類效能均獲改進;以ModernBERT-large表現最為顯著,其大類層級提升1.0%或2.5%,中類層級增益4.4%或2.2%,小類層級改善9.9%或11.5%,未擴增之類別亦提高32.0%或15.5%。整體而言,類別對照可用於擴展訓練資料,提升自動研究分類效能。
英文摘要
Access to sufficient, high-quality data is essential for effectively training and validating machine learning classifiers. This study investigates class mapping as a data fusion strategy to enhance training data for research classification. Two versions of the Australian and New Zealand Standard Research Classification, ANZSRC 2008 FoR and ANZSRC 2020 FoR, are used to organize 179,431 documents from eight institutional repositories into plain and mapped datasets. Each dataset is divided into subsets corresponding to the division, group, and field levels of the classification schemes. Results show that 49% to 63% of documents are successfully mapped between schemes. Classifiers by Support Vector Machines (SVM), SciBERT, ModernBERT-base, and ModernBERT-large are trained to assess the effectiveness of this data fusion approach on classification performance. All models show improved performance at the three levels. ModernBERT-large achieved the greatest performance gains, with the improvements in validation F1 scores of 1.0% and 2.5% at the division level, 4.4% and 2.2% at the group level, and 9.9% and 11.5% at the field level. An emergent ability was observed, as performance in non-augmented classes improved with ModernBERT-large but not with ModernBERT-base. Overall, this study demonstrates that class mapping effectively enriches training datasets, enhances classification performance, and underscores the importance of model size and architecture. These findings offer a practical and scalable strategy for improving machine learning performance in research classification tasks.
起訖頁 119-143
關鍵詞 互通性概念間對應機器學習研究分類InteroperabilityInter-concept MappingMachine LearningResearch Classification
刊名 圖書資訊學刊  
期數 202512 (23:2期)
出版單位 國立臺灣大學圖書資訊學系
該期刊-上一篇 疫情期間大學生由社群媒體接收錯誤訊息之資訊行為分析
該期刊-下一篇 賦權國小國語文學習:利用大型語言模型生成符合專業觀點的客製化教材
 

新書閱讀



最新影音


優惠活動




讀者服務專線:+886-2-23756688 傳真:+886-2-23318496
地址:臺北市館前路28 號 7 樓 客服信箱
Copyright © 元照出版 All rights reserved. 版權所有,禁止轉貼節錄