英文摘要 |
Objectives: We compared ChatGPT’s performance to medical students’on psychiatry examinations and investigated whether raters could distinguish answers between them. Methods: We used a copy of short-answer questions from a psychiatry examination to compare the performance of three randomized groups–ChatGPT, student, and hybrid (student-modified ChatGPT responses). Furthermore, we investigated raters’ability to identify response origins. Results: ChatGPT-assisted answers, both ChatGPT alone (p < 0.001) and hybrid (p < 0.001), had significantly better examination performance than did independent students work. Raters showed high accuracy in identifying the origin of the responses, correctly identifying 92% of both students’and ChatGPT-assisted responses. But raters were only 61% accurate in making the distinction between ChatGPT and hybrid answers. Conclusion: ChatGPT showed superior performance in a psychiatry examination compared to students’work, but raters showed high accuracy in distinguishing them. Further investigation is warranted to optimize the advantages and mitigate the drawbacks of incorporating such technology into psychiatric education and health care. |