計算wav文件和錄製聲音的聲譜圖（正常化音量）

我想比較錄製的音頻和以一致的方式從磁盤讀取的音頻，但我遇到了音量標準化的問題（否則譜圖的振幅不同）。計算wav文件和錄製聲音的聲譜圖（正常化音量）

我也從來沒有使用過信號，FFT或WAV格式，所以這對我來說是一個新的未知領域。我檢索渠道簽署了16位整數的名單在44100赫茲在磁盤上的.wav文件的採樣來自

錄製的音樂從我的筆記本電腦玩

，然後我通過每進行一個窗口（2^k）與一定量的重疊。對於每一個窗口，像這樣：

# calculate window variables 
window_step_size = int(self.window_size * (1.0 - self.window_overlap_ratio)) + 1 
last_frame = nframes - window_step_size # nframes is total number of frames from audio source 
num_windows, i = 0, 0 # calculate number of windows 
while i <= last_frame: 
    num_windows += 1 
    i += window_step_size 

# allocate memory and initialize counter 
wi = 0 # index 
nfft = 2 ** self.nextpowof2(self.window_size) # size of FFT in 2^k 
fft2D = np.zeros((nfft/2 + 1, num_windows), dtype='c16') # 2d array for storing results 

# for each window 
count = 0 
times = np.zeros((1, num_windows)) # num_windows was calculated 

while wi <= last_frame: 

    # channel_samples is simply list of signed ints 
    window_samples = channel_samples[ wi : (wi + self.window_size)] 
    window_samples = np.hamming(len(window_samples)) * window_samples 

    # calculate and reformat [[[[ THIS IS WHERE I'M UNSURE ]]]] 
    fft = 2 * np.fft.rfft(window_samples, n=nfft)/nfft 
    fft[0] = 0 # apparently these are completely real and should not be used 
    fft[nfft/2] = 0 
    fft = np.sqrt(np.square(fft)/np.mean(fft)) # use RMS of data 
    fft2D[:, count] = 10 * np.log10(np.absolute(fft)) 

    # sec/frame * frames = secs 
    # get midpt 
    times[0, count] = self.dt * wi 

    wi += window_step_size 
    count += 1 

# remove NaNs, infs 
whereAreNaNs = np.isnan(fft2D); 
fft2D[whereAreNaNs] = 0; 
whereAreInfs = np.isinf(fft2D); 
fft2D[whereAreInfs] = 0; 

# find the spectorgram peaks 
fft2D = fft2D.astype(np.float32) 

# the get_2D_peaks() method discretizes the fft2D periodogram array and then 
# finds peaks and filters out those peaks below the threshold supplied 
# 
# the `amp_xxxx` variables are used for discretizing amplitude and the 
# times array above is used to discretize the time into buckets 
local_maxima = self.get_2D_peaks(fft2D, self.amp_threshold, self.amp_max, self.amp_min, self.amp_step_size, times, self.dt)

尤其是瘋狂的東西（至少對我來說），與我的註釋的行發生[[[[這是我不確定]]]]。

任何人都可以在正確的方向上指向我，或者幫助我在正確校正音量時生成此音頻譜圖嗎？

來源

2013-08-31 lollercoaster

快速查看告訴我，你忘了使用窗口，有必要計算你的頻譜圖。

你需要在你的「window_samples」使用一個窗口（漢明，漢恩）

np.hamming(len(window_samples)) * window_samples

然後你就可以計算rfft。

編輯：

#calc magnetitude from FFT 
fftData=fft(windowed); 
#Get Magnitude (linear scale) of first half values 
Mag=abs(fftData(1:Chunk/2)) 
#if you want log scale R=20 * np.log10(Mag) 
plot(Mag)

從FFT
#calc RMS
RMS = np.sqrt（（np.sum（np.abs（np.fft（數據）** 2）/ LEN（數據）））/（LEN（數據）/ 2））

RMStoDb = 20 *日誌10（RMS）

PS：如果你想從FFT計算RMS你不能使用窗口（漢寧，漢明），這條線是沒有意義的：

fft = np.sqrt(np.square(fft)/np.mean(fft)) # use RMS of data

一個簡單的標準化數據可以爲每一個窗口中完成：

window_samples = channel_samples[ wi : (wi + self.window_size)] 

#framMax=np.max(window_samples); 
framMean=np.mean(window_samples); 

Normalized=window_samples/framMean;

來源

2013-08-31 21:55:47 ederwander

我的意思是這是很好的做法，使用的值的比例在窗戶，但我認爲沒有它就沒有根本的錯誤。不過，我確實添加了您的建議。跟進：我應該有負面的振幅？例如，從2D頻譜圖陣列（上面我的代碼中的'fft2D'）中獲取所有幅度，我得到：http://i.imgur.com/gLyi3OW.png – lollercoaster

來澄清，這是一個振幅值的直方圖 – lollercoaster

線性幅度可以從FFT的前半部分獲得絕對值 - 請參閱更新！ – ederwander

計算wav文件和錄製聲音的聲譜圖（正常化音量）

回答

相關問題