中文摘要 |
目標:近年來生物醫學文獻發表量日益增加,有必要借助電腦自動化整理大量文獻並提供 有用的資訊。目前自動化整理生物資訊之著名相關網站如Coremine、STRING、DisGeNet等, 但都看不到字詞間的間接相關性。本研究欲探討PubMed收錄之非結構化摘要中,醫學主題詞 (MeSH)與基因名稱間不同年代使用次數的高低情形與各詞彙間的相關強度。方法:本篇 研究所採取的研究設計為text-mining design,研究樣本為2016年7月8日檢索PubMed並下載共 26,295,751篇文獻。分別利用美國國立醫學圖書館與國際人類基因組組織命名委員會,編製的 醫學主題詞與基因的正式名稱,檢索同義字後建立27,883個醫學主題詞的字庫與39,903個人類 基因的字庫;文字比對擷取各摘要中包含的醫學主題詞與基因名稱,以年為單位計算各詞彙於 摘要中出現次數,並使用word2vec分析詞彙間的相關強度。結果:本研究建立互動式網站,提 供查詢醫學主題詞與基因名稱在各年代摘要中出現次數與頻率,以及最常一起出現在摘要中的 相關字詞(https://yihsin.shinyapps.io/meshgeneterm_relation/)。本研究發現在2012年開始有很 多文章在摘要中提到China,次數排名於2016年擠進前8名,象徵中國在學術界的崛起;Health 從排行第7到第3名,也許表示越來越重視健康的議題。舉退化性關節炎為例,與退化性關節炎 最常一起出現在摘要中有截肢手術、髕骨、膝關節與癱瘓等,同時也看的到這些字間的間接相 關。結論:利用本研究建立的網站,瞭解各醫學主題詞與基因名稱在摘要中不同年代使用次數 與頻率,以及最常與哪些字一起出現的相關強度,讓研究者在探索新領域時能快速有概括性的 了解,取得建議研究的方向,以利往後跨領域之科學研究。 |
英文摘要 |
Objectives: In recent years, the biomedical literature has expanded by leaps and bounds. Based on studies available in the databases, it is difficult for users to sort through the massive literature and organize sets of qualitative data. At present, there are well-known websites such as Coremine, STRING, and DisGeNet; however, one can inquire only about words directly related to the search words without any indirectly relevant suggestions. Thus, there is a real need to address the issue. In this study, we investigated the number of uses in each year and the relationship between medical subject headings (MeSH) and gene names in the non-structured abstracts in PubMed. Methods: The study used a text-mining design. The study samples were the 26,295,751 articles in PubMed on July 8, 2016. Using the MeSH from the American National Library of Medicine in the MeSHBrowser, we identified 27,883 words to establish the MeSH dictionary. Genes were officially named by the Human Genome Organization Nomenclature Committee. A search of NCBI Gene yielded a dictionary of 39,903 human genes. The medical subject headings and gene names included in the abstracts were then extracted and calculated by year. We used word2vec to analyze the associations between the MeSH and gene names. Results: We built an interactive website which provides information about the number of uses in different years and the relevant words that most often appeared together with MeSH and gene names in the abstract (https://yihsin.shinyapps.io/meshgeneterm_relation/. For example, words which appeared most often together with Osteoarthritis were Osteotomy, Nails, Patella, Knee joint, Physical examination and Paralysis. There are also indirectly relevant suggestions. Conclusions: The website developed in this study provides the number of uses in different years of MeSH and gene names, and what words were most associated with them; this indicates that these words were often mentioned and discussed together in medical publications. We can also see the indirect correlations between them so that researchers exploring new areas can quickly have a general understanding of the field. |