英文摘要 |
This study aimed to evaluate the consistency of COVID-19 information produced by ChatGPT versions 3.5 and 4.0 with official releases from the World Health Organization (WHO). For this purpose, 487 COVID-19- specific questions were sourced from the WHO's official website and posed to both versions of ChatGPT. The answers generated by ChatGPT were then cross-checked with the official responses from the WHO. Two clinical experts rated these answers on a scale of 1 to 5, assessing them based on four criteria: accuracy, comprehensiveness, relevance, and clarity. The Rasch rating scale model aided in the visual representation of the findings. The results, as interpreted by the two experts, revealed notable differences in the quality of answers between the two ChatGPT versions. Specifically, ChatGPT 4.0 outperformed version 3.5 in terms of answer generation capabilities, as evidenced by the significant statistical differences in their ratings. However, despite ChatGPT 4.0's superior performance, there were still inconsistencies between its answers and the WHO's official responses. The study concludes by advising users to cross-reference ChatGPT's information with more reliable sources to avoid potential misinformation risks. |