英文摘要 |
When using Deep Learning methods in NLP-related tasks, we usually represent a word by using a low-dimensional dense vector, which is named the word embedding, and these word embeddings can then be treated as feature vectors for various neural network-based models. However, a major challenge facing such a mechanism is how to represent OOV words. There are two common strategies in practiced: one is to remove these words directly; the other is to represent OOV words by using zero or random vectors. To mitigate the flaw, we introduce an OOV embedding framework, which aims at generating reasonable low-dimensional dense vectors for OOV words. Furthermore, in order to evaluate the impact of the OOV representations, we plug the proposed framework into the Chinese machine reading comprehension task, and a series of experiments and comparisons demonstrate the good efficacy of the proposed framework. |