2013-08-31 75 views
2

我想比較錄製的音頻和以一致的方式從磁盤讀取的音頻,但我遇到了音量標準化的問題(否則譜圖的振幅不同)。計算wav文件和錄製聲音的聲譜圖(正常化音量)

我也從來沒有使用過信號,FFT或WAV格式,所以這對我來說是一個新的未知領域。我檢索渠道簽署了16位整數的名單在44100赫茲在磁盤上的.wav文件的採樣來自

  1. 錄製的音樂從我的筆記本電腦玩

,然後我通過每進行一個窗口(2^k)與一定量的重疊。對於每一個窗口,像這樣:

# calculate window variables 
window_step_size = int(self.window_size * (1.0 - self.window_overlap_ratio)) + 1 
last_frame = nframes - window_step_size # nframes is total number of frames from audio source 
num_windows, i = 0, 0 # calculate number of windows 
while i <= last_frame: 
    num_windows += 1 
    i += window_step_size 

# allocate memory and initialize counter 
wi = 0 # index 
nfft = 2 ** self.nextpowof2(self.window_size) # size of FFT in 2^k 
fft2D = np.zeros((nfft/2 + 1, num_windows), dtype='c16') # 2d array for storing results 

# for each window 
count = 0 
times = np.zeros((1, num_windows)) # num_windows was calculated 

while wi <= last_frame: 

    # channel_samples is simply list of signed ints 
    window_samples = channel_samples[ wi : (wi + self.window_size)] 
    window_samples = np.hamming(len(window_samples)) * window_samples 

    # calculate and reformat [[[[ THIS IS WHERE I'M UNSURE ]]]] 
    fft = 2 * np.fft.rfft(window_samples, n=nfft)/nfft 
    fft[0] = 0 # apparently these are completely real and should not be used 
    fft[nfft/2] = 0 
    fft = np.sqrt(np.square(fft)/np.mean(fft)) # use RMS of data 
    fft2D[:, count] = 10 * np.log10(np.absolute(fft)) 

    # sec/frame * frames = secs 
    # get midpt 
    times[0, count] = self.dt * wi 

    wi += window_step_size 
    count += 1 

# remove NaNs, infs 
whereAreNaNs = np.isnan(fft2D); 
fft2D[whereAreNaNs] = 0; 
whereAreInfs = np.isinf(fft2D); 
fft2D[whereAreInfs] = 0; 

# find the spectorgram peaks 
fft2D = fft2D.astype(np.float32) 

# the get_2D_peaks() method discretizes the fft2D periodogram array and then 
# finds peaks and filters out those peaks below the threshold supplied 
# 
# the `amp_xxxx` variables are used for discretizing amplitude and the 
# times array above is used to discretize the time into buckets 
local_maxima = self.get_2D_peaks(fft2D, self.amp_threshold, self.amp_max, self.amp_min, self.amp_step_size, times, self.dt) 

尤其是瘋狂的東西(至少對我來說),與我的註釋的行發生[[[[這是我不確定]]]]。

任何人都可以在正確的方向上指向我,或者幫助我在正確校正音量時生成此音頻譜圖嗎?

回答

1

快速查看告訴我,你忘了使用窗口,有必要計算你的頻譜圖。

你需要在你的 「window_samples」 使用一個窗口(漢明,漢恩)

np.hamming(len(window_samples)) * window_samples

然後你就可以計算rfft。

編輯:

#calc magnetitude from FFT 
fftData=fft(windowed); 
#Get Magnitude (linear scale) of first half values 
Mag=abs(fftData(1:Chunk/2)) 
#if you want log scale R=20 * np.log10(Mag) 
plot(Mag) 
從FFT

#calc RMS
RMS = np.sqrt((np.sum(np.abs(np.fft(數據)** 2)/ LEN(數據)))/(LEN(數據)/ 2))

RMStoDb = 20 *日誌10(RMS)

PS:如果你想從FFT計算RMS你不能使用窗口(漢寧,漢明),這條線是沒有意義的:

fft = np.sqrt(np.square(fft)/np.mean(fft)) # use RMS of data 

一個簡單的標準化數據可以爲每一個窗口中完成:

window_samples = channel_samples[ wi : (wi + self.window_size)] 

#framMax=np.max(window_samples); 
framMean=np.mean(window_samples); 

Normalized=window_samples/framMean; 
+0

我的意思是這是很好的做法,使用的值的比例在窗戶,但我認爲沒有它就沒有根本的錯誤。不過,我確實添加了您的建議。跟進:我應該有負面的振幅?例如,從2D頻譜圖陣列(上面我的代碼中的'fft2D')中獲取所有幅度,我得到:http://i.imgur.com/gLyi3OW.png – lollercoaster

+0

來澄清,這是一個振幅值的直方圖 – lollercoaster

+0

線性幅度可以從FFT的前半部分獲得絕對值 - 請參閱更新! – ederwander