中文摘要 |
本研究提出一個挖掘相關樣式(或事件)的方法:形狀查詢(Shape Query)。假設兩個樣式的歷史資料如果是彼此相似?則這兩個樣式可能會有相關。樣式定義為一個或一個以上連續的字?一個樣式的歷史資料定義為:在具有時間標籤的文件中?該樣式在連續時間區段出現次數所形成的時間序列。首先利用Haar小波可以做不同相似程度的形狀比對?預先將每一個樣式的歷史資料?轉換成小波序列?存在資料庫。使用者輸入一個樣式?便將該樣式歷史資料做Haar小波轉換?然後在小波資料庫中搜尋是否有形狀相似之樣式歷史資料。實驗資料來源是西元1990到2013年間PubMed文獻(共14,438,209篇)之文章摘要與標題?將抽取出顯要樣式之歷史資料?經過小波轉換後?作為樣式形狀查詢資料庫。實驗結果顯示?藉由形狀查詢?可以找到一些相關的其他:樣式?例如利用“威爾森氏症"(Wilson's Disease)作為查詢樣本?可以找到歷史資料形狀相似的兩個樣式:“腦部基底核"("acute basal ganglia")和“抗氧化劑之活性改變"("changes in activity of antioxidant"),且這兩個樣式經由專家證明與威爾森氏症確實有相關。 |
英文摘要 |
This study proposes an approach, Shape Query, to mine for related patterns (or events). We as-sume that two patterns might be related with each other if one pattern history are similar to another. A pattern is defined as one or more con-secutive words; the history of a pattern is the fre-quency distribution of that pattern appearing in the consecutive equal-size time intervals among time-stamps texts. To take advantage of the characteristics of Haar wavelet that can be able to keep the skeleton of one shape under control precisely, first of all, all of pattern histories are transformed into Haar Wavelet series, and stored these series in database. Given a pattern by user, we transform that pattern history into Haar Wavelet series and then search for shape similar patterns based on that transformed series. The database for the experiments of pattern shape query derives from the histories of significant pat-terns extracted from the abstracts and titles of 14,438,209 articles within PubMed from 1990 to 2013. Experimental results show that we can find related patterns according to shape query. Given a pattern "Wilson's Disease" for shape query, for example, there are two shape similar patterns, "acute basal ganglia" and "changes in activity of antioxidant", that are proved by domain experts related to pattern "Wilson's Disease". |