將聲音文件導入Python作爲NumPy數組（替代audiolab）

我一直在使用Audiolab來導入聲音文件，它工作得很好。但是：將聲音文件導入Python作爲NumPy數組（替代audiolab）

它不支持某些格式，如MP3，因爲libsndfile refuses to support them
它doesn't work in Python 2.6 under Windows底層，筆者不在身邊解決它

In [2]: from scikits import audiolab 
-------------------------------------------------------------------- 

ImportError        Traceback (most recent call last) 

C:\Python26\Scripts\<ipython console> in <module>() 

C:\Python26\lib\site-packages\scikits\audiolab\__init__.py in <module>() 
    23 __version__ = _version 
    24 
---> 25 from pysndfile import formatinfo, sndfile 
    26 from pysndfile import supported_format, supported_endianness, \ 
    27      supported_encoding, PyaudioException, \ 

C:\Python26\lib\site-packages\scikits\audiolab\pysndfile\__init__.py in <module>() 
----> 1 from _sndfile import Sndfile, Format, available_file_formats, available_encodings 
     2 from compat import formatinfo, sndfile, PyaudioException, PyaudioIOError 
     3 from compat import supported_format, supported_endianness, supported_encoding 

ImportError: DLL load failed: The specified module could not be found.``

所以我想要：

弄清楚爲什麼它不是在2.6（有毛病_sndfile.pyd？），也許工作找到一種方法來擴展它不支持的格式工作
查找AUDIOLAB的完全更換

來源

2010-03-01 endolith

這個問題是特定於窗口上的Python 2.6（即你不會看到它在Python 2.5）。我還沒有找到一種方法來解決它 – 2010-07-22 08:55:07

而且我最終花了兩次航班之間的時間，最終成爲了一個名字錯誤。我發佈了一個新的0.11.0版本，它應該解決這個問題。 – 2010-07-23 13:00:10

David，你已經在audiolab中製作了一個很棒的工具！我經常使用它。謝謝。 – 2010-07-25 02:27:58

我一直在使用PySoundFile，而不是最近的AUDIOLAB的。它可以通過conda輕鬆安裝。

它does not support mp3，像大多數事情一樣。 MP3不再獲得專利，所以沒有理由不支持它;有人只需要write support into libsndfile。

來源

2018-02-26 14:53:32 endolith

AUDIOLAB的是在Ubuntu 9.04和Python 2.6.2上爲我工作，所以它可能是一個Windows問題。在您的論壇鏈接中，作者還建議這是一個Windows錯誤。

在過去，這個選項很適合我，太：

from scipy.io import wavfile 
fs, data = wavfile.read(filename)

只是提防data可能int數據類型，所以它不是[1,1）內進行縮放。例如，如果data爲int16，則必須將data除以2**15以在[-1,1）內進行縮放。

來源

2010-03-01 22:46:13

可以scipy.io閱讀24位WAV嗎？ – endolith 2010-03-01 22:55:21

我對此不確定。 16位或32位應該沒問題，但我不知道24位。 – 2010-03-01 23:07:50

它沒有讀取任何東西。即使是16位文件也會反轉，並且環繞錯誤的值爲-1。 24位獲得「TypeError：數據類型不明白」肯定有更好的... – endolith 2010-03-09 05:27:18

Sox http://sox.sourceforge.net/可以成爲你的朋友。它可以讀取許多不同的格式，並以任何你喜歡的數據類型作爲原始數據輸出。實際上，我只是編寫代碼來將音頻文件中的數據塊讀取到一個numpy數組中。

我決定走這條路線以實現便攜性（sox非常廣泛），並最大限度地提高我可以使用的輸入音頻類型的靈活性。實際上，從最初的測試來看，它似乎並不明顯地慢於我正在使用它......這是從非常長的（小時）文件中讀取短時間（幾秒）的音頻。

變量，你需要：

SOX_EXEC# the sox/sox.exe executable filename 
filename # the audio filename of course 
num_channels # duh... the number of channels 
out_byps # Bytes per sample you want, must be 1, 2, 4, or 8 

start_samp # sample number to start reading at 
len_samp # number of samples to read

實際的代碼是非常簡單的。如果你想提取整個文件，你可以刪除start_samp，len_samp和'trim'內容。

import subprocess # need the subprocess module 
import numpy as NP # I'm lazy and call numpy NP 

cmd = [SOX_EXEC, 
     filename,    # input filename 
     '-t','raw',   # output file type raw 
     '-e','signed-integer', # output encode as signed ints 
     '-L',     # output little endin 
     '-b',str(out_byps*8), # output bytes per sample 
     '-',     # output to stdout 
     'trim',str(start_samp)+'s',str(len_samp)+'s'] # only extract requested part 

data = NP.fromstring(subprocess.check_output(cmd),'<i%d'%(out_byps)) 
data = data.reshape(len(data)/num_channels, num_channels) # make samples x channels

PS：這裏是代碼來讀取使用SOX音頻文件頭的東西...

info = subprocess.check_output([SOX_EXEC,'--i',filename]) 
    reading_comments_flag = False 
    for l in info.splitlines(): 
     if(not l.strip()): 
      continue 
     if(reading_comments_flag and l.strip()): 
      if(comments): 
       comments += '\n' 
      comments += l 
     else: 
      if(l.startswith('Input File')): 
       input_file = l.split(':',1)[1].strip()[1:-1] 
      elif(l.startswith('Channels')): 
       num_channels = int(l.split(':',1)[1].strip()) 
      elif(l.startswith('Sample Rate')): 
       sample_rate = int(l.split(':',1)[1].strip()) 
      elif(l.startswith('Precision')): 
       bits_per_sample = int(l.split(':',1)[1].strip()[0:-4]) 
      elif(l.startswith('Duration')): 
       tmp = l.split(':',1)[1].strip() 
       tmp = tmp.split('=',1) 
       duration_time = tmp[0] 
       duration_samples = int(tmp[1].split(None,1)[0]) 
      elif(l.startswith('Sample Encoding')): 
       encoding = l.split(':',1)[1].strip() 
      elif(l.startswith('Comments')): 
       comments = '' 
       reading_comments_flag = True 
      else: 
       if(other): 
        other += '\n'+l 
       else: 
        other = l 
       if(output_unhandled): 
        print >>sys.stderr, "Unhandled:",l 
       pass

來源

2012-03-21 04:40:56 travc

有趣的是，雖然有點可笑，也許不是跨平臺的？有[pysox]（http://pypi.python.org/pypi/pysox）直接與[libSoX]（http://sox.sourceforge.net/libsox.html）庫進行連接。看起來像[SoX自己支持一堆格式]（http://sox.sourceforge.net/Docs/Features），可以使用其他幾個庫來獲得更多。我有很多問題讓audiolab工作，並且它不支持MP3等，所以pysox可能值得一試。 – endolith 2012-03-21 15:41:32

我會看看pysox ......謝謝。儘管使用sox的子進程方法並不是真正的pythonic或漂亮的，但它非常強大且相對便攜（因爲可以在大多數系統中找到sox二進制文件/安裝程序）。 – travc 2012-04-21 08:56:27

FFmpeg的支持MP3和適用於Windows（http://zulko.github.io/blog/2013/10/04/read-and-write-audio-files-in-python-using-ffmpeg/）。

讀取MP3文件：

import subprocess as sp 

FFMPEG_BIN = "ffmpeg.exe" 

command = [ FFMPEG_BIN, 
     '-i', 'mySong.mp3', 
     '-f', 's16le', 
     '-acodec', 'pcm_s16le', 
     '-ar', '44100', # ouput will have 44100 Hz 
     '-ac', '2', # stereo (set to '1' for mono) 
     '-'] 
pipe = sp.Popen(command, stdout=sp.PIPE, bufsize=10**8)

格式數據轉換成numpy的數組：

raw_audio = pipe.proc.stdout.read(88200*4) 

import numpy 

audio_array = numpy.fromstring(raw_audio, dtype="int16") 
audio_array = audio_array.reshape((len(audio_array)/2,2))

來源

2016-06-01 15:38:24

如果你想爲MP3

這裏做到這一點就是我使用的是什麼：它使用pydub和scipy。

完全安裝（在Mac上，可以在其他系統不同）：

import tempfile 
import os 
import pydub 
import scipy 
import scipy.io.wavfile 


def read_mp3(file_path, as_float = False): 
    """ 
    Read an MP3 File into numpy data. 
    :param file_path: String path to a file 
    :param as_float: Cast data to float and normalize to [-1, 1] 
    :return: Tuple(rate, data), where 
     rate is an integer indicating samples/s 
     data is an ndarray(n_samples, 2)[int16] if as_float = False 
      otherwise ndarray(n_samples, 2)[float] in range [-1, 1] 
    """ 

    path, ext = os.path.splitext(file_path) 
    assert ext=='.mp3' 
    mp3 = pydub.AudioSegment.from_mp3(FILEPATH) 
    _, path = tempfile.mkstemp() 
    mp3.export(path, format="wav") 
    rate, data = scipy.io.wavfile.read(path) 
    os.remove(path) 
    if as_float: 
     data = data/(2**15) 
    return rate, data

感謝James Thompson's blog

來源

2018-02-26 06:37:39 Peter

將聲音文件導入Python作爲NumPy數組（替代audiolab）

回答

相關問題