英文摘要 |
Short query terms often result in irrelevant search results due to the lack of appropriate contexts to disambiguate real user intention and thus introduce search errors. Without high precision raw search results, post re-ranking modules may not really help since garbage input only results in garbage output. Automatic query formulation, which supplies appropriate left and right contexts to the query terms, is therefore an important pre-processing technique for acquiring highly relevant documents and submitting them for post re-ranking. A systematic approach for augmenting short query terms with the best contextual text patterns is proposed in this paper for matching answers of some well-defined questions such as“the birthday of Bill Gates”(and most factoid questions). The augmentation patterns are learned to directly maximize the top-1 accuracy rate for searching relevant documents. In comparison with the basic two-term query form, which submits a key entity query term (‘Bill Gates') plus an intended attribute (‘birthday') to be answered, the augmented patterns achieve 31% top-1 accuracy rate, in contrast to the extremely low, 4%, accuracy achieved by two-term query; the top-10 performance, which is about 54%, is also significantly better than the 21% accuracy with two-term query. This also implies that about 57% of the top-10 results have their correct answer given at the first place. By using appropriate augmented query terms, the correct search results can thus often be ranked at the first few places, and very likely at rank-1. Experiments show that the augmented query patterns significantly boost the top-1 performance for answering well-defined questions. By applying such techniques to queries, it is likely to improve the search precision significantly, sometimes even without the help of post re-ranking. |