月旦知識庫
 
  1. 熱門:
 
首頁 臺灣期刊   法律   公行政治   醫事相關   財經   社會學   教育   其他 大陸期刊   核心   重要期刊 DOI文章
中華民國風濕病雜誌 本站僅提供期刊文獻檢索。
  【月旦知識庫】是否收錄該篇全文,敬請【登入】查詢為準。
最新【購點活動】


篇名
跨越風濕病學:GPT-4-turbo在醫學考試中的性能
並列篇名
Beyond Rheumatology: GPT-4-turbo’s Superior Performance in Medical Examinations
中文摘要
目標:大型語言模型(Large Language Models, LLM)的應用潛力正在各個專業領域中被逐步探索與評估。本研究的目標是通過檢測生成式預先訓練轉換器3.5增強版(Generative Pre-trained Transformer3.5-turbo,簡稱GPT-3.5-turbo)、生成式預先訓練轉換器4(Generative Pre-trained Transformer 4,簡稱GPT-4)和生成式預先訓練轉換器4增強版(Generative Pre-trained Transformer 4-turbo,簡稱GPT-4-turbo)在處理台灣內科專科考試試題上的能力,來探討這些模型在醫學領域的適用性,而初步研究則著聚焦於風濕病學領域能力的評估。
方法:本研究首先分析2018至2022年連續五年內科專科醫師考試中的73道風濕病學問題來評估初步表現。隨後,該研究進一步分析2022年內科專科醫師考試的146道試題,以驗證LLM評估能力。在表現分析方面,本研究主要採用了中文直接問答和零次學習的思考鏈(Chain of thought, CoT)推理技術,來分別評估這些方法的效果。
結果:在風濕病學相關題目的測試中,使用CoT推理的GPT-3.5-turbo並未顯能有效提升答題效能,其平均分數維持在62分上下。相對GPT-4系列的答題表現則顯得較為出色,不論是採用直接問答的GPT-4還是結合CoT的GPT-4-turbo,均獲得高達96.5分的成績。值得注意的是,當擴展至內科各次專科領域進行測試時,GPT-4-turbo結合CoT推理的表現顯著地提升答題效能。
結論:本研究強調了GPT-4模型在解讀及回答醫學考試試題的出色表現,尤其當GPT-4-turbo結合零次CoT推理時,最佳化大型語言模型在分析風濕病學,乃至其他內科學領域考題時,其在語言和概念上的挑戰,有著穩健的處理能力。
英文摘要
Objectives: Large Language Models (LLMs) are increasingly being evaluated for their potential use in specialized domains. This study investigates the abilities of Generative Pre-trained Transformer 3.5-turbo (GPT-3.5-turbo), Generative Pre-trained Transformer 4 (GPT-4), and Generative Pre-trained Transformer 4-turbo (GPT-4-turbo) within the medical field by testing their performance on the Taiwan Internal Medicine Board Examination questions, with an initial focus on rheumatology.
Methods: The study evaluated baseline performance by analyzing 73 rheumatology questions taken from five consecutive examination years (2018-2022). This evaluation was then broadened to include a larger set of 146 internal medicine questions from the year 2022 to generalize the findings. Performance was assessed using direct queries in Chinese and the application of zero-shot Chain-of-Thought (CoT) reasoning.
Results: Among the rheumatology questions, no significant improvement was seen in GPT-3.5-turbo’s performance with the CoT reasoning, consistently yielding scores with an average around 62. In contrast, GPT-4 variants excelled, with both GPT-4 using direct queries and GPT-4-turbo with CoT achieving an outstanding average score of 96.5. When broadened to include questions regarding subspecialties of internal medicine questions, notably, GPT-4-turbo exhibited significantly enhanced performance with the CoT methodology.
Conclusions: The study highlights the superior performance of GPT-4 models in interpreting and responding to medical examination questions. It specifically underscores the potential of GPT-4-turbo, in conjunction with CoT reasoning, to optimize the utilization of LLMs in rheumatology and potentially other medical domains, indicating its robust capability in meeting the linguistic and conceptual challenges presented in medical examinations.
起訖頁 25-39
關鍵詞 大型語言模型基於轉換器的生成式預訓練模型醫師執照考試專科醫師考試試題Large Language Models (LLM)Generative Pre-trained Transformer (GPT)Chain of thought reasoningMedical license examination (MLE)Medical board review questions
刊名 中華民國風濕病雜誌  
期數 202406 (38:1期)
出版單位 中華民國風濕病醫學會
該期刊-上一篇 全身性紅斑狼瘡患者罹患細菌性腦膜炎的危險因子分析:台灣全國性人口為基礎的回顧性研究
該期刊-下一篇 臺灣特發性發炎性肌病變病患之階層式分群特徵
 

新書閱讀



最新影音


優惠活動




讀者服務專線:+886-2-23756688 傳真:+886-2-23318496
地址:臺北市館前路28 號 7 樓 客服信箱
Copyright © 元照出版 All rights reserved. 版權所有,禁止轉貼節錄