我想比較錄製的音頻和以一致的方式從磁盤讀取的音頻,但我遇到了音量標準化的問題(否則譜圖的振幅不同)。計算wav文件和錄製聲音的聲譜圖(正常化音量)
我也從來沒有使用過信號,FFT或WAV格式,所以這對我來說是一個新的未知領域。我檢索渠道簽署了16位整數的名單在44100赫茲在磁盤上的.wav文件的採樣來自
- 錄製的音樂從我的筆記本電腦玩
,然後我通過每進行一個窗口(2^k)與一定量的重疊。對於每一個窗口,像這樣:
# calculate window variables
window_step_size = int(self.window_size * (1.0 - self.window_overlap_ratio)) + 1
last_frame = nframes - window_step_size # nframes is total number of frames from audio source
num_windows, i = 0, 0 # calculate number of windows
while i <= last_frame:
num_windows += 1
i += window_step_size
# allocate memory and initialize counter
wi = 0 # index
nfft = 2 ** self.nextpowof2(self.window_size) # size of FFT in 2^k
fft2D = np.zeros((nfft/2 + 1, num_windows), dtype='c16') # 2d array for storing results
# for each window
count = 0
times = np.zeros((1, num_windows)) # num_windows was calculated
while wi <= last_frame:
# channel_samples is simply list of signed ints
window_samples = channel_samples[ wi : (wi + self.window_size)]
window_samples = np.hamming(len(window_samples)) * window_samples
# calculate and reformat [[[[ THIS IS WHERE I'M UNSURE ]]]]
fft = 2 * np.fft.rfft(window_samples, n=nfft)/nfft
fft[0] = 0 # apparently these are completely real and should not be used
fft[nfft/2] = 0
fft = np.sqrt(np.square(fft)/np.mean(fft)) # use RMS of data
fft2D[:, count] = 10 * np.log10(np.absolute(fft))
# sec/frame * frames = secs
# get midpt
times[0, count] = self.dt * wi
wi += window_step_size
count += 1
# remove NaNs, infs
whereAreNaNs = np.isnan(fft2D);
fft2D[whereAreNaNs] = 0;
whereAreInfs = np.isinf(fft2D);
fft2D[whereAreInfs] = 0;
# find the spectorgram peaks
fft2D = fft2D.astype(np.float32)
# the get_2D_peaks() method discretizes the fft2D periodogram array and then
# finds peaks and filters out those peaks below the threshold supplied
#
# the `amp_xxxx` variables are used for discretizing amplitude and the
# times array above is used to discretize the time into buckets
local_maxima = self.get_2D_peaks(fft2D, self.amp_threshold, self.amp_max, self.amp_min, self.amp_step_size, times, self.dt)
尤其是瘋狂的東西(至少對我來說),與我的註釋的行發生[[[[這是我不確定]]]]。
任何人都可以在正確的方向上指向我,或者幫助我在正確校正音量時生成此音頻譜圖嗎?
我的意思是這是很好的做法,使用的值的比例在窗戶,但我認爲沒有它就沒有根本的錯誤。不過,我確實添加了您的建議。跟進:我應該有負面的振幅?例如,從2D頻譜圖陣列(上面我的代碼中的'fft2D')中獲取所有幅度,我得到:http://i.imgur.com/gLyi3OW.png – lollercoaster
來澄清,這是一個振幅值的直方圖 – lollercoaster
線性幅度可以從FFT的前半部分獲得絕對值 - 請參閱更新! – ederwander