Audio signal processing in speech recognition
Time:2021-12-14
Views:2492
本文為荷蘭代爾夫特理工大學(xué)(作者:JOEP DE JONG)的學(xué)士論文,共50頁。
利用神經(jīng)網(wǎng)絡(luò)對語音進(jìn)行轉(zhuǎn)錄是一項值得關(guān)注的技術(shù),目前,語音助手正變得越來越流行。神經(jīng)網(wǎng)絡(luò)通常很難確定說話人和噪音之間的區(qū)別。人類對這一點有了更好的理解,并可能應(yīng)用它們對信號結(jié)構(gòu)的知識來提高對神經(jīng)網(wǎng)絡(luò)的理解。
理解和轉(zhuǎn)錄歌曲的歌詞是一個非常困難的問題,本文分析了可應(yīng)用于歌曲的信號處理技術(shù),以提高對語音識別算法的理解。主要集中在從伴奏中過濾歌詞。介紹了幾種基本的濾波方法,包括低幅度濾波結(jié)按道和帶通濾波。同時,還民于在討論了利用背景音樂周期性的兩個更復(fù)雜的濾波器。第一種濾波器是使用二維傅里葉變換的語音分離方法。該方法由PremSeetharaman、Fatemeh Pishdadian、BryanPardo于2017年提出,將信號處理和圖像處理技術(shù)相結(jié)合,通過識別信號譜圖的二維傅里葉變換中的峰值來發(fā)現(xiàn)信號中的新抓周期性重復(fù)。第二種濾波器是一種新提出的方法,可用于分離背景音樂。該算法通過比較譜圖中的序列,如果有多個與所選列相似地出現(xiàn)(重復(fù)),則將該列分類為重疊列。然后,將重疊列的頻率分量(通過離散短時傅里葉變換獲得的不同頻率)與其他列中相同頻率的分量進(jìn)行比較。在某些情況下,重疊的頻率分量從頻譜圖的其他列分量中減去,以此消除了歌曲中重復(fù)的頻率。在這種方法的多次迭代之后,譜圖的主要成分最有可能對應(yīng)于歌曲中重復(fù)最少的部分。討論了在構(gòu)造比較譜圖列的方法時所作的決定,并與使用二維傅里葉變換方法的步驟進(jìn)行了比較。從研究結(jié)果可以看出,二維傅立葉變換在嚴(yán)格的周期伴奏中表現(xiàn)得更好,而比較譜圖列的方法在節(jié)奏不太緊湊的歌曲中表現(xiàn)得更好。
The transcription of voice using neural networks is a technique that deserves attention, asspeech assistantsare becoming increasingly popular. Neural networks have often difficulty withdetermining the differencesbetween a talking person and noise. Humans have a much betterunderstanding of this and could possibly applytheir knowledge of the structure of the signalsto improve the understanding ofthe neural network. A problem that isextremely difficult for aneural network is understanding and transcribing thelyrics of a song.This thesis analyzes signal-processing techniques that can beapplied to a song to improvethe understanding of a speech-recognitionalgorithm. It is mainly focused onfiltering the fore-ground lyrics from the accompaniment. Some basic filtering methods are describedincluding alow-amplitude filter and a band-pass filter. But also two more complicated filters whichmakeuse of the periodicity of the background music will be treated.The first filter is a method of voice separation using the two-dimensional Fourier transform.This method, proposed by PremSeetharaman, Fatemeh Pishdadian, Bryan Pardo in 2017 [15],combines techniquesof signal-processing and image-processing by finding periodic repetitionsin a signalby identifying peaks in the two-dimensional Fourier transform of thespectrogram ofthe signal.The second filter is a newly proposed method that canbe used for the separation of foregroundfrom background music. The algorithm compares columns in the spectrogram and classifiescolumns asoverlapping if there are multiple occurrences of columns similar to theselected col-umn (repetitions). Thefrequency components, the different frequencies obtained from adiscreteshort-time Fourier transform, of overlapping columns are afterwardcompared with componentsof the same frequency in other columns. Under certain circumstances, overlapping frequencycomponents are subtractedfrom components in other columns of the spectrogram. This removesrepetitions of that frequencythroughout the song. The components ofthe spectrogram that re-main after several iterations of this method are mostlikely to correspond to the least repetitiveparts of the song.The decisions that are made while constructing the method of comparing spectrogramcolumns are discussed and are compared with steps performed in the method that uses thetwo-dimensional Fourier transform. An implementation and demonstration are alsoattached.From the research it is expected that the two-dimensional Fouriertransform perform better onstrict periodic accompaniment, while the method thatcompares spectrogram columns is morelikely to perform better on songs with aless tight rhythm.
1.引言
2.信號、采樣與頻譜理論
3.濾波
4.通過比較頻譜列分離語音信號
5.具體實現(xiàn)與驗證
6.討論與結(jié)論
免責(zé)聲明: 本文章轉(zhuǎn)自其它平臺,并不代表本站觀點及立場。若有侵權(quán)或異議,請聯(lián)系我們刪除。謝謝! |