English  |  正體中文  |  简体中文  |  Items with full text/Total items : 94286/110023 (86%)
Visitors : 21696681      Online Users : 799
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    ASIA unversity > 資訊學院 > 資訊工程學系 > 期刊論文 >  Item 310904400/8958


    Please use this identifier to cite or link to this item: http://asiair.asia.edu.tw/ir/handle/310904400/8958


    Title: Design and Evaluation of Approaches for Automatic Chinese Text Categorization
    Authors: Jyh-Jong Tsay;Jing-Doo Wang
    Keywords: Term Clustering;Term Selection;Text Categorization
    Date: 2000
    Issue Date: 2010-04-15 05:42:25 (UTC+0)
    Abstract: In this paper, we propose and evaluate approaches to categorizing Chinese
    texts, which consist of term extraction, term selection, term clustering and text
    classification. We propose a scalable approach which uses frequency counts to
    identify left and right boundaries of possibly significant terms. We used the
    combination of term selection and term clustering to reduce the dimension of the
    vector space to a practical level. While the huge number of possible Chinese terms
    makes most of the machine learning algorithms impractical, results obtained in an
    experiment on a CAN news collection show that the dimension could be
    dramatically reduced to 1200 while approximately the same level of classification
    accuracy was maintained using our approach. We also studied and compared the
    performance of three well known classifiers, the Rocchio linear classifier, naive
    Bayes probabilistic classifier and k-nearest neighbors(kNN) classifier, when they
    were applied to categorize Chinese texts. Overall, kNN achieved the best accuracy,
    about 78.3%, but required large amounts of computation time and memory when
    used to classify new texts. Rocchio was very time and memory efficient, and
    achieved a high level of accuracy, about 75.4%. In practical implementation,
    Rocchio may be a good choice.
    Relation: International Journal of Computational Linguistics and Chinese Language Processing (CLCLP) 5(2) : 43-58
    Appears in Collections:[資訊工程學系] 期刊論文

    Files in This Item:

    File Description SizeFormat
    0KbUnknown520View/Open
    310904400-8958.doc32KbMicrosoft Word205View/Open


    All items in ASIAIR are protected by copyright, with all rights reserved.


    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback