ASIA unversity:Item 310904400/26278
English  |  正體中文  |  简体中文  |  全文笔数/总笔数 : 94286/110023 (86%)
造访人次 : 21657446      在线人数 : 389
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻


    jsp.display-item.identifier=請使用永久網址來引用或連結此文件: http://asiair.asia.edu.tw/ir/handle/310904400/26278


    题名: 樣式依存度問題於網頁探勘之影響研究
    作者: 吳勝堂
    贡献者: 資訊學院;資訊多媒體應用學系?
    关键词: 樣式分類模式;網頁探勘;樣式依存度;Pattern Taxonomy Model;Web Mining;Pattern Dependency
    日期: 2011
    上传时间: 2013-07-18 07:53:26 (UTC+0)
    摘要: 隨著網際網路的快速發展,網頁探勘(Web Mining)的議題一直受到大家的矚目,網頁內容探勘一般可視為文字探勘(Text Mining)的子範疇,目前以樣式分類模式(Pattern Taxonomy Model)為基礎的網頁探勘的方法主要分為兩大步驟,第一個步驟是透過文件索引(Indexing)的方式,建立字詞特徵空間(Feature Space);第二個步驟則是利用樣式進化(Pattern Evolving) 技術,轉換特徵詞成更具描述力的樣式(Pattern),最後進行文件分類(Classification)或資訊過濾(Information Filtering)等相關應用的工作。然而,文件索引所遇到的困難是應如何找到適量且具重要性的特徵字詞,而樣式進化則是遇到如何以少量字詞整合出更具效能樣式的問題。過去特徵選取的方式是以傳統資訊檢索(Information Retrieval)或資訊理論(Information Theory)的方法計算特徵權重,排序之後截取部分重要的字詞組成特徵向量(Feature Vector)。然而,這些方法卻忽略了字詞間的相依特性(Dependency)以及相互關係(Correlation)等重要的資訊,因此,必須分析樣式依存度的特性,才能找到更好的解決方法。本計畫的目的是發展一套有效的特徵選取(Feature Selection)方法,藉由分析樣式依存度的問題,建立更具效率的樣式進化方案,並整合於上述的網頁探勘系統(PTM-based Web Mining)中,透過實際資料集的數值模擬實驗,利用標準的評估指標分析實驗結果,最後與其它著名的方法做比較。

    Along with the fast growth of Internet, the issue of Web Mining has drawn so much attention for recent years. Web content mining can be viewed as a sub-field of Text Mining. There are two main steps used by a Pattern Taxonomy Model (PTM)-based Web Mining system. First, it uses indexing technique to build a feature space. Then the features are transformed into patterns by using pattern evolving method for the purpose of implementing classification or information filtering works. However, we encounter the problem of how to find a sufficient amount of useful patterns for dealing with document indexing. The other problem is how to generate powerful patterns from as less features as possible during the step of pattern evolving. The traditional way to form a feature space is to adopt either information-retrieval or information-theory feature weighting methods. Then the features with higher weighs are selected into the feature vector. However, these methods ignore some important information, such as term dependency and term correlation. Therefore, we have to analyze the impact of pattern dependency to solve the problem. The main goal of this project is to develop an effective feature selection method and to find an efficient way to evolve the patterns by analyzing the feature dependency. Furthermore, a PTM-based Web Mining system will be developed and experiments will be performed on several real datasets. The experimental results then will be compared to other methods based on standard measures for the purpose of system evaluation.
    显示于类别:[行動商務與多媒體應用學系] 科技部研究計畫

    文件中的档案:

    档案 描述 大小格式浏览次数
    index.html0KbHTML482检视/开启


    在ASIAIR中所有的数据项都受到原著作权保护.


    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 回馈