| 英文摘要 |
In this research, we investigated GPT-4 as a question-answering model for the Holy Quran. As a first step, we built the Quran question–answer pair (QUQA) dataset, comprising 2,189 questions, and made it freely available via our repository. This dataset was then used to benchmark the performance of the current Generative Pre-trained Transformer 4 (GPT-4) model from the OpenAI research laboratory. The results show that GPT-4 did not do well with this dataset, with a 0.23 partial Average Precision (pAP) score, 0.26 F1 1 score, and 0.19 Exact Match (EM) score. Therefore, further improvement is needed for Classical Arabic responses generated by GPT model. |