| 英文摘要 |
Identifying duplicate sentences remains a significant challenge in NLP, which is utilised in question-answering and paraphrase detection systems. One such platform is Quora, where users can post questions and answers. Due to the large number of users, it is commonly seen that most of the inquiries that people post are the same. This makes it challenging to ask and answer the same question multiple times in distinct ways. High-quality answers can be obtained by identifying such repeated requests, which could improve the user experience. One of the already existing approaches, which has employed the Siamese MaLSTM Model and ELMo Word Embedding for Quora Questions Detection, utilized the Manhattan Distance for sentence similarity measurement in the Quora Question pairs dataset available on Kaggle. In this paper, we have proposed an enhancement model by incorporating Squared Eu¬clidean Distance alongside Manhattan Distance. Feature engineering is also used to generate additional features, such as sentence length difference and cosine similarity between ELMo embeddings. In addition, a few preprocessing techniques are also applied to improve the effectiveness of data samples. Due to computational constraints, we utilized a subset of the dataset, and the findings showed that the proposed model outperformed the existing one by 2%. Hence, the suggested model has made a substantial contribu¬tion to the detection of duplicate questions. For comparison, we have used multiple transformer-based models from HuggingFace. |