英文摘要 |
There is increasing interest in computer-based linguistic technologies, including speech recognition and understanding, optical and pen-based character recognition, text retrieval and understanding, and machine translation. In each area, we have useful present-day systems and realistic expectations of progress. However, because human language is so complex and information-rich, computer programs for processing it must be fed enormous amounts of varied linguistic data-speech, text, lexicons, and grammars-to be robust and effective. Such databases are expensive to create and document, with maintenance and distribution adding additional costs. Not even the largest companies can easily afford enough of this data to satisfy their research and development needs. Researchers at smaller companies and in universities risk being frozen out of the process almost entirely. For pre-competitive research, shared resources also provide benefits that closely-held or proprietary resources do not. Shared resources permit replication of published results, support fair comparison of alternative algorithms or systems, and permit the research community to benefit from corrections and additions provided by individual users. |