使用深度學習神經網路實現具有語者辨識之會議記錄生成系統

ASIA unversity > 資訊學院 > 資訊傳播學系 > 博碩士論文 > Item 310904400/113708

Please use this identifier to cite or link to this item: http://asiair.asia.edu.tw/ir/handle/310904400/113708

Title:	使用深度學習神經網路實現具有語者辨識之會議記錄生成系統 Implementation of a Conference Minute Generation System Using Deep-learning Neural Networks with Speaker Identification
Authors:	王亮瑜 WANG, LIANG-YU
Contributors:	資訊傳播學系
Keywords:	語音辨識;語者辨識;遷移式學習;深度學習神經網路;關鍵辭檢測 Keyword spotting;Speaker identification;Speech rocognition;Transfer learning;Deep-learning neural networks
Date:	2022-07-08
Issue Date:	2022-10-31 04:03:14 (UTC+0)
Publisher:	亞洲大學
Abstract:	會議紀錄的工作需要在快速進行的議程中，同時分辨說話者身分與說話內容，並且正確的記錄內容是十分沉重而繁雜的工作；會議進行中也經常出現與會者雖然出席，但卻不知道討論內容是什麼狀況；大型會議主持人與出席人員也可能不知道發言的人是誰，或聽不清楚發言的內容。本文的目的在於開發一項自動產生會議記錄的系統，自動將會議內容轉換為文字記錄，減少會議紀錄的記錄時間與人力成本，並找出會議中的關鍵辭，讓與會者能快速地了解會議主題與方向，避免發生文不對題的狀況。本文系統包含三部分：中文語者辨識(speaker identification)、語音辨識(speech recognition)、與關鍵辭辨識。在語者辨識的部份透過使用遷移式學習(transfer learning)以自製語料庫訓練YAMNet卷積神經網路（Convolutional Neural Network, CNN）辨識說話的語者；語音辨識使用Google- Speech- to- text API將語音資料轉換為文字。最後把語者及語音辨識結果同時顯示出來，達到自動產生會議紀錄的功能。在關鍵辭辨識使用 Jieba 中文斷詞工具，以先前辨識的文字紀錄為基礎，偵測出現頻率較高的字詞當作關鍵辭，讓與會者即時掌握目前會議的重點。實驗結果證明本文方法可以準確的辨識中文語者的身分，同時也能正確的辨識出不同講者說話的內容，完成會議記錄與辨識語音中的關鍵辭。 The work of recording meeting minutes requires one to recognize the speaker and what they said simultaneously as the agenda proceeds. Verifying the content of meeting minutes is also a heavy workload. Although attendants are present at the meeting, they may not be clear about the main point of the agenda, which happens all the time; furthermore, the host and attendants of the conference may not know who the speaker is and struggle to hear the speech. We aim to develop a system that generates meeting/conference minutes automatically. This task is implemented by transferring speech signals to text and finding the meeting/ conference keywords. So the proposed system can decrease time-consuming and labor costs, keep attendants on track, and help them get to the point effectively. The system includes Chinese speaker identification, speech recognition, and keyword spotting. Regarding speaker identification, we used transfer learning with a self-made database to train YAMNet convolutional neural network. Google Speech-to-text API supports speech recognition. After speaker identification and speech recognition, the system shows the results to realize automatically generated meeting/ conference minutes. In addition, we apply Jieba for Chinese word segmentation. We detect the words’ frequency of occurrence based on the recognized results from Google API. High appearance words are determined as keywords that can help attendants stay with the agenda. The experimental results show that the proposed system can identify speakers precisely and recognize speech accurately. Therefore, the meeting/conference minutes can be generated while the keywords are spotted correctly.
Appears in Collections:	[資訊傳播學系] 博碩士論文

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	56	View/Open

Loading...