使用多元特徵參數於前饋式深度學習神經網路之語音增強系統

ASIA unversity > 資訊學院 > 資訊傳播學系 > 博碩士論文 > Item 310904400/114452

Please use this identifier to cite or link to this item: http://asiair.asia.edu.tw/ir/handle/310904400/114452

Title:	使用多元特徵參數於前饋式深度學習神經網路之語音增強系統 A Speech Enhancement System Using Feed-Forward Deep-Learning Neural Network with Multi-Features
Authors:	鄭喬馨 Zheng, Qiao-Xin
Contributors:	資訊傳播學系
Keywords:	語音增強;語音降噪;前饋式神經網路;深度學習神經網路;語音估測 Speech enhancement;speech noise reduction;feedforward neural network;deep-learning neural network;speech estimation
Date:	2020
Issue Date:	2022-12-19 02:50:50 (UTC+0)
Publisher:	亞洲大學
Abstract:	在資訊數位化的時代，語音訊息普遍應用在行動裝置與各種軟體中，然而在現實生活中的雜訊干擾往往會影響語音的可辨識度，甚至是讓語音受到嚴重干擾，如何降低語音中的噪音，是極為重要的研究。許多語音增強處理都致力於保留語音的成分，但卻也同時產生許多殘留雜訊，導致增強語音依然受到嚴重的干擾。由於深度學習神經網路技術已經非常成熟，如何透過深度學習神經網路提高語音增強的效能為本文的研究重點。在受雜訊干擾的聲音中，語音中的母音能量具有較高能量，而且變化速度較穩定，所以母音區段的越零率與音高週期的變動量比較小，因此本文將信號能量、越零率、和音高週期當作神經網路的特徵參數，作為偵測語音區段的主要方法。另外，基於語音在每個次頻帶都有不同能量的特性，有些頻帶中語音能量較小，較容易受到雜訊干擾，因此本文將每個次頻帶分開處理，逐一分析每個次頻帶能量與前後相鄰音框和高低相鄰次頻帶之間的關連性，使用前饋式神經網路進行訓練，判斷該次頻帶是否屬於語音為主的信號，若是則依照神經網路辨識的增益值進行雜訊抑制；相對的，如果是雜訊為主的次頻帶，則進行信號移除，達到大幅抑制背景雜訊的目的。實驗結果證明，本文提出的頻譜偵測DNN的偵測結果受到音框偵測DNN的影響，被估測為無語音的音框會完全刪除雜訊，在有語音區段，本文方法增強後的語音可以有效的進行雜訊抑制，在10 dB雜訊環境中抑制效果最佳，而嚴重雜訊干擾的0 dB環境中，依然則能有效抑制雜訊。 Speech signals are commonly used in mobile devices and various software. However, the speech signals are interfered with by background noise, enabling the quality to be deteriorated. It is essential to enhance the quality of the speech signals. Most speech enhancement systems are dedicated to preserving the speech components in noisy speech, but they also cause much residual noise at the same time, resulting in the enhanced speech still being severely corrupted. In this thesis, we aim to develop a speech enhancement system using deep-learning neural networks (DLNN). The energy of a vowel is higher than that of silence and unvoiced regions in noisy speech. Also, the zero-crossing rate and the variation of the pitch period are low in the vowel section. Accordingly, the energy, zero-crossing rate, and pitch period are utilized as the features of deep-learning neural network to detect the speech-activity segment in a noisy speech. Moreover, the relationship between the adjacent frames and neighboring sub-bands is leaned by the DLNN. The DLNN identifies the gain factor of each sub-band. The enhanced spectrum is obtained by multiplying the gain factor with the noisy spectrum, enabling the background noise to be removed effectively. The experimental results show that the DLNN can accurately detect speech frames, yielding background noise being thoroughly removed during speech-pause regions.
Appears in Collections:	[資訊傳播學系] 博碩士論文

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	47	View/Open

Loading...