樣式依存度問題於網頁探勘之影響研究

ASIA unversity > 資訊學院 > 行動商務與多媒體應用學系 > 科技部研究計畫 > Item 310904400/26278

Please use this identifier to cite or link to this item: http://asiair.asia.edu.tw/ir/handle/310904400/26278

Title:	樣式依存度問題於網頁探勘之影響研究
Authors:	吳勝堂
Contributors:	資訊學院;資訊多媒體應用學系?
Keywords:	樣式分類模式;網頁探勘;樣式依存度;Pattern Taxonomy Model;Web Mining;Pattern Dependency
Date:	2011
Issue Date:	2013-07-18 07:53:26 (UTC+0)
Abstract:	隨著網際網路的快速發展，網頁探勘(Web Mining)的議題一直受到大家的矚目，網頁內容探勘一般可視為文字探勘(Text Mining)的子範疇，目前以樣式分類模式(Pattern Taxonomy Model)為基礎的網頁探勘的方法主要分為兩大步驟，第一個步驟是透過文件索引(Indexing)的方式，建立字詞特徵空間(Feature Space)；第二個步驟則是利用樣式進化(Pattern Evolving) 技術，轉換特徵詞成更具描述力的樣式(Pattern)，最後進行文件分類(Classification)或資訊過濾(Information Filtering)等相關應用的工作。然而，文件索引所遇到的困難是應如何找到適量且具重要性的特徵字詞，而樣式進化則是遇到如何以少量字詞整合出更具效能樣式的問題。過去特徵選取的方式是以傳統資訊檢索(Information Retrieval)或資訊理論(Information Theory)的方法計算特徵權重，排序之後截取部分重要的字詞組成特徵向量(Feature Vector)。然而，這些方法卻忽略了字詞間的相依特性(Dependency)以及相互關係(Correlation)等重要的資訊，因此，必須分析樣式依存度的特性，才能找到更好的解決方法。本計畫的目的是發展一套有效的特徵選取(Feature Selection)方法，藉由分析樣式依存度的問題，建立更具效率的樣式進化方案，並整合於上述的網頁探勘系統(PTM-based Web Mining)中，透過實際資料集的數值模擬實驗，利用標準的評估指標分析實驗結果，最後與其它著名的方法做比較。 Along with the fast growth of Internet, the issue of Web Mining has drawn so much attention for recent years. Web content mining can be viewed as a sub-field of Text Mining. There are two main steps used by a Pattern Taxonomy Model (PTM)-based Web Mining system. First, it uses indexing technique to build a feature space. Then the features are transformed into patterns by using pattern evolving method for the purpose of implementing classification or information filtering works. However, we encounter the problem of how to find a sufficient amount of useful patterns for dealing with document indexing. The other problem is how to generate powerful patterns from as less features as possible during the step of pattern evolving. The traditional way to form a feature space is to adopt either information-retrieval or information-theory feature weighting methods. Then the features with higher weighs are selected into the feature vector. However, these methods ignore some important information, such as term dependency and term correlation. Therefore, we have to analyze the impact of pattern dependency to solve the problem. The main goal of this project is to develop an effective feature selection method and to find an efficient way to evolve the patterns by analyzing the feature dependency. Furthermore, a PTM-based Web Mining system will be developed and experiments will be performed on several real datasets. The experimental results then will be compared to other methods based on standard measures for the purpose of system evaluation.
Appears in Collections:	[行動商務與多媒體應用學系] 科技部研究計畫

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	482	View/Open

Loading...