ASIA unversity:Item 310904400/101842
English  |  正體中文  |  简体中文  |  全文笔数/总笔数 : 94286/110023 (86%)
造访人次 : 21710553      在线人数 : 440
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻
    ASIA unversity > 資訊學院 > 資訊工程學系 > 期刊論文 >  Item 310904400/101842


    jsp.display-item.identifier=請使用永久網址來引用或連結此文件: http://asiair.asia.edu.tw/ir/handle/310904400/101842


    题名: Big Data Mining with Parallel Computing: A Comparison of Distributed and MapReduce Methodologies
    作者: 蔡志豐;Tsai*, Chih-Fong;林維昭;Lin, Wei-Chao;Ke, Shih-Wen;Ke, Shih-Wen
    贡献者: 資訊工程學系
    日期: 2016-12
    上传时间: 2016-12-05 06:57:18 (UTC+0)
    摘要: Mining with big data or big data mining has become an active research area. It is very difficult using current methodologies and data mining software tools for a single personal computer to efficiently deal with very large datasets. The parallel and cloud computing platforms are considered a better solution for big data mining. The concept of parallel computing is based on dividing a large problem into smaller ones and each of them is carried out by one single processor individually. In addition, these processes are performed concurrently in a distributed and parallel manner. There are two common methodologies used to tackle the big data problem. The first one is the distributed procedure based on the data parallelism paradigm, where a given big dataset can be manually divided into n subsets, and n algorithms are respectively executed for the corresponding n subsets. The final result can be obtained from a combination of the outputs produced by the n algorithms. The second one is the MapReduce based procedure under the cloud computing platform. This procedure is composed of the map and reduce processes, in which the former performs filtering and sorting and the later performs a summary operation in order to produce the final result. In this paper, we aim to compare the performance differences between the distributed and MapReduce methodologies over large scale datasets in terms of mining accuracy and efficiency. The experiments are based on four large scale datasets, which are used for the data classification problems. The results show that the classification performances of the MapReduce based procedure are very stable no matter how many computer nodes are used, better than the baseline single machine and distributed procedures except for the class imbalance dataset. In addition, the MapReduce procedure requires the least computational cost to process these big datasets.
    關聯: JOURNAL OF SYSTEMS AND SOFTWARE
    显示于类别:[資訊工程學系] 期刊論文

    文件中的档案:

    档案 大小格式浏览次数
    index.html0KbHTML448检视/开启


    在ASIAIR中所有的数据项都受到原著作权保护.


    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 回馈