應用樣式分類模式於知識探勘之研究

ASIA unversity > 資訊學院 > 行動商務與多媒體應用學系 > 科技部研究計畫 > Item 310904400/9594

Please use this identifier to cite or link to this item: http://asiair.asia.edu.tw/ir/handle/310904400/9594

Title:	應用樣式分類模式於知識探勘之研究
Authors:	吳勝堂
Contributors:	資訊學院資訊多媒體應用學系
Date:	2009
Issue Date:	2010-05-12 06:36:57 (UTC+0)
Abstract:	在過去十幾年中，為了達到協助使用者擷取有效的訊息，有許多資料探勘 (Data Mining) 的技術紛紛被提出來以完成各式各樣的知識探勘 (Knowledge Discovery) 任務。在這些技術的使用下，各種不同形式的樣式 (Pattern) 也相繼產生出來，例如：序列樣式 (Sequential Pattern)、頻繁項目組 (Frequent Itemset) 以及封閉式樣式 (Closed Pattern) 與最大樣式 (Maximum Pattern) 等等。然而，在資料探勘的研究領域裡，如何有效地使用這些發現的樣式，則仍是一個懸而未決的議題。在大部份文件探勘的技術中，均採取關鍵字的方法以建造由單一文字 (Word) 或單一項目 (Term) 所形成的文字內容表示法，然而其他的研究技術則相信，詞組片語所攜有的資訊比單一文字來得多的假說，而捨棄了關鍵字的方法，改選擇以詞組片語來建造文字內容的表示法。令人遺憾的是，這些以詞組片語為基礎的技術方法並未帶來明顯的效果。推究其原因，則應是高頻率的詞組(通常是較短的詞組)通常擁有較高量的涵蓋性 (Exhaustivity)，但卻也含有較低量的具體性 (Specificity)，於是那些具描述性的詞組便會遭逢所謂低頻次的問題。樣式分類法模型 (Pattern Taxonomy Model, PTM) 是一個以樣式為基礎的技術方法，其採用了序列樣式探勘法並以封閉式樣式作為文字代表法的元素。PTM 針對較長的具體性樣式，運用樣式映射的策略，試圖解決上述低頻次的問題。然而，在PTM 系統的內容學習階段中，負向資料 (Negative Example) 仍被忽略而未被妥善的使用，而系統所發現的樣式則需要這些資訊來做重新的評估。因此，本計畫將以發展具有效能及效率的樣式進化 (Pattern Evolution) 方法為目標，以期能夠解決上述的問題。所提的方法將會以實際的知識探勘任務來做測試，實驗的結果也將會和現有的方法來做比較，以評估系統的效能。 In the last decade, many data mining techniques have been proposed for fulfilling various knowledge discovery tasks in order to achieve the goal of retrieving useful information for users. Various types of patterns can then be generated using these techniques, such as sequential patterns, frequent itemsets, and closed and maximum patterns. However, how to effectively exploit the discovered patterns is still an open research issue, especially in the domain of text mining. Most of the text mining methods adopt the keyword-based approach to construct text representations which consist of single words or single terms, whereas other methods have tried to use phrases instead of keywords, based on the hypothesis that the information carried by a phrase is considered more than that by a single term. Nevertheless, these phrase-based methods did not yield significant improvements due to the fact that the patterns with high frequency (normally the shorter patterns) usually have a high value on exhaustivity but a low value on specificity, and thus the specific patterns encounter the low frequency problem. Pattern Taxonomy Model (PTM) is a pattern-based method which adopts the technique of sequential pattern mining and uses closed patterns as features in the representative. PTM uses the strategy of mapping discovered patterns into a hypothesis space and solves the low-frequency problem pertaining to the specific long patterns. However, information from the negative examples has not been adequately evaluated during the phase of concept learning in a PTM-based system. The discovered patterns then need to be evolved by exploiting such information. Therefore, this project aims to develop an effective and efficient approach for pattern evolution for overcoming the aforementioned problem. The proposed system will be examined by conducting the real knowledge discovery tasks and the experimental results will be compared to those of other existing methods.
Appears in Collections:	[行動商務與多媒體應用學系] 科技部研究計畫

Files in This Item:

File	Size	Format
98吳勝堂1.pdf	67Kb	Adobe PDF	873	View/Open
98吳勝堂2.pdf	36Kb	Adobe PDF	410	View/Open
index.html	0Kb	HTML	503	View/Open

Loading...