音頻數據字符串格式到numpy陣列

我想轉換numpy.array的音頻採樣率（從44100到22050）與88200樣本，其中我已經做了一些過程（如添加沉默和轉換爲單聲道）。我試圖用audioop.ratecv轉換這個數組並且它工作，但它返回一個str而不是一個numpy數組，當我用scipy.io.wavfile.write寫這些數據時，結果是一半的數據丟失了，音頻速度快了一倍（而不是比較慢，至少這會讓人感覺有點）。 audio.ratecv與str數組很好地工作，如wave.open返回，但我不知道如何處理這些，所以我試圖從numpy.array2string(data)將str轉換爲numpy以在ratecv上傳遞此值並獲得正確的結果，然後再次轉換爲numpy與numpy.fromstring(data, dtype)和現在len數據是8個樣本。我認爲這是由於格式的複雜性，但我不知道如何控制它。我也不知道什麼樣的格式str wave.open返回，所以我可以強制格式在這一個。音頻數據字符串格式到numpy陣列

這裏是我的代碼

def conv_sr(data, srold, fixSR, dType, chan = 1): 
    state = None 
    width = 2 # numpy.int16 
    print "data shape", data.shape, type(data[0]) # returns shape 88200, type int16 
    fragments = numpy.array2string(data) 
    print "new fragments len", len(fragments), "type", type(fragments) # return len 30 type str 
    fragments_new, state = audioop.ratecv(fragments, width, chan, srold, fixSR, state) 
    print "fragments", len(fragments_new), type(fragments_new[0]) # returns 16, type str 
    data_to_return = numpy.fromstring(fragments_new, dtype=dType) 
    return data_to_return

這一部分，我這樣稱呼它

data1 = numpy.array(data1, dtype=dType) 
data_to_copy = numpy.append(data1, data2) 
data_to_copy = _to_copy.sum(axis = 1)/chan 
data_to_copy = data_to_copy.flatten() # because its mono 

data_to_copy = conv_sr(data_to_copy, sr, fixSR, dType) #sr = 44100, fixSR = 22050 

scipy.io.wavfile.write(filename, fixSR, data_to_copy)

來源

2017-08-26 Mike Kampitakis

更多的研究，我發現我錯了一下後，似乎16位音頻由兩個8位'單元'，所以我戴的dtype是錯誤的，這就是爲什麼我有音頻速度問題。我找到了正確的dtype here。所以，在conv_sr DEF，我傳遞一個numpy的陣列，將其轉換爲數據串，把它傳遞給轉換的採樣率，重新轉換爲numpy的陣列，用於scipy.io.wavfile.write最後，轉換2個8位到16位格式

def widthFinder(dType): 
    try: 
     b = str(dType) 
     bits = int(b[-2:]) 
    except: 
     b = str(dType) 
     bits = int(b[-1:]) 
    width = bits/8 
    return width 

def conv_sr(data, srold, fixSR, dType, chan = 1): 
    state = None 
    width = widthFinder(dType) 
    if width != 1 and width != 2 and width != 4: 
     width = 2 
    fragments = data.tobytes() 
    fragments_new, state = audioop.ratecv(fragments, width, chan, srold, fixSR, state) 
    fragments_dtype = numpy.dtype((numpy.int16, {'x':(numpy.int8,0), 'y':(numpy.int8,1)})) 
    data_to_return = numpy.fromstring(fragments_new, dtype=fragments_dtype) 
    data_to_return = data_to_return.astype(dType) 
    return data_to_return

如果您發現任何錯誤，請隨時糾正我，我仍然是一個學習者

來源

2017-08-29 23:41:39

音頻數據字符串格式到numpy陣列

回答

相關問題