月旦知識庫
月旦知識庫 會員登入元照網路書店月旦品評家
 
 
  1. 熱門:
首頁 臺灣期刊   法律   公行政治   醫事相關   財經   社會學   教育   其他 大陸期刊   核心   重要期刊 DOI文章
台灣精神醫學雜誌 本站僅提供期刊文獻檢索。
  【月旦知識庫】是否收錄該篇全文,敬請【登入】查詢為準。
最新【購點活動】


篇名
Performances of Large Language Models in Detecting Psychiatric Diagnoses from Chinese Electronic Medical Records: Comparisons between GPT-3.5, GPT-4, and GPT-4o
並列篇名
Performances of Large Language Models in Detecting Psychiatric Diagnoses from Chinese Electronic Medical Records: Comparisons between GPT-3.5, GPT-4, and GPT-4o
作者 Chien Wen Chien (Chien Wen Chien)Yueh-Ming Tai (Yueh-Ming Tai)
英文摘要
Objectives: As a type of artificial intelligence (AI), the large language model (LLM) is designed to understand and generate human-like fluent texts. Typical LLMs, e.g., GPT-3.5, GPT-4, and GPT-4o, interact with users through“prompts”and some internal parameters, like“temperature.”Currently, some AI models have been widely used in the field of psychiatry, but systemic reports examining the capacity and suitability of LLM in detecting psychiatry diagnoses are still lacking. In this study, we intended to explore the performances of different generations of LLMs with different levels of temperature in detecting mental illnesses from electronic medical records (EMRs). Methods: We collected 500 Chinese EMRs from one mental hospital in northern Taiwan, with the“current medical history”section as corpuses. We used the GPT-3.5-turbo-16K, GPT-4, and GPT-4o models provided by Microsoft’s Azure OpenAI service (www.portal.azure.com) to generate AIbased predictions (the probability) for the diagnoses of major depressive disorder (MDD), schizophrenia (SCZ), attention-deficit/hyperactivity disorder (ADHD), and autistic spectrum disorder (ASD). Clinic diagnoses made by qualified psychiatrists were treated as gold standards (target) of receiver operating characteristic curve analysis. Then, their area under the ROC curve (AUCs) were compared using the DeLong test. Results: Among 500 recruited Chinese EMRs in this study, 56.6% were primarily diagnosed with MDD, as well as 22.4% with SCZ, 11.2% with ADHD, and 9.2% with ASD. In general, our LLMs achieved AUCs of 0.84 to 0.98 for detecting four different diagnoses. There were no significant differences between versions, but newer versions (GPT-4o models with AUCs of 0.98–0.97 for SCZ, ADHD, and ASD) performed better than older versions (GPT-3.5 models with AUCs of 0.88–0.96) except for MDD (AUC of 0.95 for GPT-4 and AUC of 0.93 for GPT-4o). Although DeLong tests showed nonsignificant differences between the AUCs of models with different levels of temperature, models with zero temperatures generally represented the best performances in magnitudes. Conclusion: To the best of our knowledge, this study is the first to demonstrate that LLMs performed excellently in distinguishing some mental illnesses. Nevertheless, the diagnostic capabilities of LLMs differed from other diagnoses such as MDD. We hypothesize that this phenomenon may partially result from the complexity of symptomology and/or the content filtering rules of OpenAI. Therefore, more advanced models, e.g., GPT-5, or private training models, e.g., Llamma 3, with the relevance generative answering technique, are expected to answer our questions.
起訖頁 134-141
關鍵詞 area under curseInternational Classification of Diseases10th versionnatural language processingpsychiatry diagnosis
刊名 台灣精神醫學雜誌  
期數 202409 (38:3期)
出版單位 台灣精神醫學會
該期刊-上一篇 Influences of Potential Military Conflict between Taiwan and China on the Intention to Emigrate among Taiwanese Individuals
該期刊-下一篇 Benefits of Cognitive Behavior Therapy with Neuroscience-informed Psychoeducation in Young Adult Patients: A Preliminary Study
 

新書閱讀



最新影音


優惠活動




讀者服務專線:+886-2-23756688 傳真:+886-2-23318496
地址:臺北市館前路28 號 7 樓 客服信箱
Copyright © 元照出版 All rights reserved. 版權所有,禁止轉貼節錄