scikit擬合數據錯誤

我想實現一個學習算法來預測圖像是否有目標值1或0。首先，我的目標值設定爲如此...scikit擬合數據錯誤

real = [1] * len(images) 
fake = [0] * len(fake_images) 

total_target = real + fake 
total_target = numpy.array(total_target) 

>>> [1 1 1 ... 0 0 0 0]

接下來，我將圖像轉換列表爲numpy陣列numpy陣列。所以我每個圖像存儲爲numpy陣列...

training_set = [] 
for image in total_images: 
    im = image.convert("L") 
    dataset = numpy.asarray(im) 
    training_set.append(dataset) 
training_set = numpy.array(training_set)

所以training_set保存圖像。 training_set的順序對應於total_target的順序，因此training_set中的第一個圖像對應於total_target中的第一個值，在上例中它將爲1。

接下來，我拉平訓練集...

n_samples = len(training_set) 
data = training_set.reshape((n_samples, -1))

現在我將它傳遞到下面的...

classifier = svm.SVC(gamma=0.001) 
classifier.fit(data[:n_samples-1], total_target[:n_samples-1])

我不包括最後的圖像和其相應的價值，因爲這是我想要預測的值...

expected = total_target[-1] 
predicted = classifier.predict(data[-1])

當我運行所有這些，我得到以下錯誤。 ..

DeprecationWarning：傳遞1d數組作爲數據在0.17中被棄用，並且在0.19中將會引起ValueError。如果數據具有單個特徵，則使用X.reshape（-1，1）重新整形數據，如果數據包含單個特徵，則使用X.reshape（1，-1）重整數據。 DeprecationWarning）

OK，所以由它看起來像我的total_target錯誤是在錯誤的格式，所以我添加下面...

total_target = numpy.array(total_target).reshape(-1, 1)

我運行它，現在我得到以下錯誤

DataConversionWarning：在預期1d數組時，傳遞了列向量y。請將y的形狀更改爲（n_samples），例如使用ravel（）。 y_ = column_or_1d（y，warn = True）

C：\ Users \ Eric \ Anaconda2 \ lib \ site-packages \ sklearn \ utils \ validation.py：386：DeprecationWarning：作爲數據傳遞1d數組在0.17並在0.19中提示ValueError。如果數據具有單個特徵，則使用X.reshape（-1，1）重新整形數據，如果數據包含單個特徵，則使用X.reshape（1，-1）重整數據。 DeprecationWarning）

我嘗試使用上total_targetravel()但它只是把我帶回到錯誤之前。我認爲我的格式是錯誤的我對numpy陣列很新。

來源

2017-02-01 Bolboa

'OK，所以由它看起來像我的total_target是格式錯誤，'錯誤 - 不，scikit-學習抱怨'data [-1]'是一個平面向量而不是二維數組。 'total_target'應該是一個平面向量，它不需要改變它。 – cel

Numpy的atleast_2d獲取代碼的工作。

首先讓我們產生了一些模擬數據，截至1200列，即5真實的800行5假的8位圖像：

In [111]: import numpy as np 

In [112]: real, fake = 5, 5 

In [113]: rows, cols = 800, 1200 

In [114]: bits = 8 

In [115]: target = np.hstack([np.ones(real), np.zeros(fake)]) 

In [116]: np.random.seed(2017) 

In [117]: images = np.random.randint(2**bits, size=(real + fake, rows, cols)) 

In [118]: data = images.reshape(images.shape[0], -1) 

In [119]: data 
Out[119]: 
array([[ 59, 9, 198, ..., 189, 201, 38], 
     [150, 251, 145, ..., 95, 214, 175], 
     [156, 212, 220, ..., 179, 63, 48], 
     ..., 
     [ 25, 94, 108, ..., 159, 144, 216], 
     [179, 103, 217, ..., 92, 219, 34], 
     [198, 209, 177, ..., 6, 4, 144]]) 

In [120]: data.shape 
Out[120]: (10L, 960000L)

然後我們培養使用所有，但最後一個圖像分類：

In [121]: from sklearn import svm 

In [122]: classifier = svm.SVC(gamma=0.001) 

In [123]: classifier.fit(data[:-1], target[:-1]) 
Out[123]: 
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, 
    decision_function_shape=None, degree=3, gamma=0.001, kernel='rbf', 
    max_iter=-1, probability=False, random_state=None, shrinking=True, 
    tol=0.001, verbose=False)

如果你現在嘗試通過classifier.predict(data[-1])分類最後一張圖片，sklearn抱怨。爲了使sklearn快樂，你只需要確保測試數據是二維的是這樣的：

In [124]: classifier.predict(np.atleast_2d(data[-1])) 
Out[124]: array([ 1.])

來源

2017-02-01 11:28:19 Tonechas

scikit擬合數據錯誤

回答

相關問題