2012-04-12 71 views
5

我有正確的CSV文件中的數據加載到數據集PyBrain一些工作代碼:PyBrain:使用numpy.loadtxt加載數據?

def old_get_dataset(): 

    reader = csv.reader(open('test.csv', 'rb')) 

    header = reader.next() 
    fields = dict(zip(header, range(len(header)))) 
    print header 

    # assume last field in csv is single target variable 
    # and all other fields are input variables 
    dataset = SupervisedDataSet(len(fields) - 1, 1) 

    for row in reader: 
     #print row[:-1] 
     #print row[-1] 
     dataset.addSample(row[:-1], row[-1]) 

    return dataset 

現在我試圖重寫這段代碼使用numpy的的loadtxt函數。我相信addSample可以使用numpy數組,而不必一次添加一行數據。

假設我的加載numpy數組是m×n維,如何將第一個m x(n-1)數據集作爲第一個參數,並將最後一列數據作爲第二個參數?這就是我想:

def get_dataset(): 

    array = numpy.loadtxt('test.csv', delimiter=',', skiprows=1) 

    # assume last field in csv is single target variable 
    # and all other fields are input variables 
    number_of_columns = array.shape[1] 
    dataset = SupervisedDataSet(number_of_columns - 1, 1) 

    #print array[0] 
    #print array[:,:-1] 
    #print array[:,-1] 
    dataset.addSample(array[:,:-1], array[:,-1]) 

    return dataset 

但我發現了以下錯誤:

Traceback (most recent call last): 
    File "C:\test.py", line 109, in <module> 
    (d, n, t) = main() 
    File "C:\test.py", line 87, in main 
    ds = get_dataset() 
    File "C:\test.py", line 45, in get_dataset 
    dataset.addSample(array[:,:-1], array[:,-1]) 
    File "C:\Python27\lib\site-packages\pybrain\datasets\supervised.py", 
     line 45, in addSample self.appendLinked(inp, target) 
    File "C:\Python27\lib\site-packages\pybrain\datasets\dataset.py", 
     line 215, in appendLinked self._appendUnlinked(l, args[i]) 
    File "C:\Python27\lib\site-packages\pybrain\datasets\dataset.py", 
     line 197, in _appendUnlinked self.data[label][self.endmarker[label], :] = row 
ValueError: output operand requires a reduction, but reduction is not enabled 

我該如何解決這個問題?

+0

我認爲這個問題可能與addSample()預計爲這兩個參數的二維數組,但是我傳遞一個1二維數組。我對如何製作二維目標數組有點困惑,因爲每個訓練示例只有一個目標變量。 – User 2012-04-13 00:28:19

回答

8

經過大量試驗,並重新讀取dataset documentation,以下運行沒有錯誤的:

def get_dataset(): 

    array = numpy.loadtxt('test.csv', delimiter=',', skiprows=1) 

    # assume last field in csv is single target variable 
    # and all other fields are input variables 
    number_of_columns = array.shape[1] 
    dataset = SupervisedDataSet(number_of_columns - 1, 1) 

    print array[0] 
    #print array[:,:-1] 
    #print array[:,-1] 
    #dataset.addSample(array[:,:-1], array[:,-1]) 
    #dataset.addSample(array[:,:-1], array[:,-2:-1]) 
    dataset.setField('input', array[:,:-1]) 
    dataset.setField('target', array[:,-1:]) 

    return dataset 

我必須仔細檢查它做正確的事情。

0

我寫了一個小功能來做到這一點

def load_csv(filename, cols, sep = ',', skip = 0): 
    from numpy import loadtxt 
    data = loadtxt(filename, delimiter = sep, usecols = cols, skiprows = skip) 
    return data