運用文字探勘與機器學習分析年報文字資訊與裁罰案件之關聯：以台灣電子零件業為例

ASIAIR > College of Management > Department of Accounting and Information Systems > Theses & dissertations > Item 310904400/115688

Please use this identifier to cite or link to this item: http://asiair.asia.edu.tw/ir/handle/310904400/115688

Title:	運用文字探勘與機器學習分析年報文字資訊與裁罰案件之關聯：以台灣電子零件業為例 Applying Text Mining Techniques and Machine Learning to Analyze the Relationship between Annual Reports and the Enforcement Actions: A Case Study of Electronic Component Industry in Taiwan
Authors:	賴一青
Contributors:	會計與資訊學系
Keywords:	股東會年報、文字探勘、懷卡托智慧分析系統、決策樹、貝式網路 Annual report、Text mining、Weka、Decision tree(J48)、Bayesnet
Date:	2023
Issue Date:	2023-05-02 02:29:10 (UTC+0)
Publisher:	亞洲大學
Abstract:	隨著科技迅速的發產標對於產業累積越來越多的資訊，過去的財務實證研究中多以企業的結構化資料財務型指標做為分析的標的，隨著機器學習技術不斷的更新也發展出文本分析的方法從非結構化的文字中找出隱含的意義。 2012年IFRS開始適用後，越來越重視企業應遵循一致的會計準則而且對於訊息揭露的品質在法規上也制定諸多的規範保障利害關係人及防止舞弊發生。本研究以2013年至2018年之間的電子零件業上市公司於這段研究期間受到裁罰處份的公司發佈的股東會年報，透過斷詞後的詞彙進行分群找出詞彙間的關聯性，再依照年度分別找出每年的重要訊息的關鍵字進行分類建模，以機器學習的演算法邏輯迴歸、決策樹、隨機森林、貝式網路、支援向量機進行預測，結果顯示五種演算法的正確分類率皆達75%以上，其中以決策樹與貝式網路表現最佳。 With the rapid development of technology and the accumulation of more and more information for the industry, In the past financial empirical research, most of the financial indicators of the company’s structured data were used as the target of analysis. With the continuous update of machine learning technology, text analysis methods have also been developed to find hidden meanings from unstructured text. Since the introduction of IFRS in 2012, more and more attention has been paid to enterprises to follow consistent accounting standards and to formulate many regulations on the quality of information disclosure to protect stakeholders and prevent fraud. This study uses the annual reports of shareholders' meetings issued by listed companies in the electronic components industry between 2013 and 2018 that were sanctioned during this research period, and finds the correlation between words by grouping the words after word segmentation. Then according to the year, find out the keywords of the important information of each year for classification modeling, and use the machine learning algorithm logistic regression, decision tree, random forest, Bayesian network, and support vector machine to make predictions. The results show five algorithms The correct classification rates of all methods are above 75%, among which the decision tree and Bayesian network perform best.
Appears in Collections:	[Department of Accounting and Information Systems] Theses & dissertations

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	48	View/Open

Loading...