2016-02-13 61 views
3

這裏的不一致數發現陣列是我的代碼:ValueError異常:與樣本[6 1786]

from sklearn.svm import SVC 
from sklearn.grid_search import GridSearchCV 
from sklearn.cross_validation import KFold 
from sklearn.feature_extraction.text import TfidfVectorizer 
from sklearn import datasets 
import numpy as np 

newsgroups = datasets.fetch_20newsgroups(
       subset='all', 
       categories=['alt.atheism', 'sci.space'] 
     ) 
X = newsgroups.data 
y = newsgroups.target 

TD_IF = TfidfVectorizer() 
y_scaled = TD_IF.fit_transform(newsgroups, y) 
grid = {'C': np.power(10.0, np.arange(-5, 6))} 
cv = KFold(y_scaled.size, n_folds=5, shuffle=True, random_state=241) 
clf = SVC(kernel='linear', random_state=241) 

gs = GridSearchCV(estimator=clf, param_grid=grid, scoring='accuracy', cv=cv) 
gs.fit(X, y_scaled) 

我得到錯誤,我不明白爲什麼。追溯:

Traceback (most recent call last): File
"C:/Users/Roman/PycharmProjects/week_3/assignment_2.py", line 23, in

gs.fit(X, y_scaled) #TODO: check this line File "C:\Users\Roman\AppData\Roaming\Python\Python35\site-packages\sklearn\grid_search.py",
line 804, in fit
return self._fit(X, y, ParameterGrid(self.param_grid)) File "C:\Users\Roman\AppData\Roaming\Python\Python35\site-packages\sklearn\grid_search.py",
line 525, in _fit
X, y = indexable(X, y) File "C:\Users\Roman\AppData\Roaming\Python\Python35\site-packages\sklearn\utils\validation.py",
line 201, in indexable
check_consistent_length(*result) File "C:\Users\Roman\AppData\Roaming\Python\Python35\site-packages\sklearn\utils\validation.py",
line 176, in check_consistent_length
"%s" % str(uniques))

ValueError: Found arrays with inconsistent numbers of samples: [ 6 1786]

有人可以解釋爲什麼會發生此錯誤嗎?

回答

2

我想你在這裏與你的Xy有點混淆。你想要將你的X變成一個tf-idf矢量,並使用這個矢量對y進行訓練。看下面

from sklearn.svm import SVC 
from sklearn.grid_search import GridSearchCV 
from sklearn.cross_validation import KFold 
from sklearn.feature_extraction.text import TfidfVectorizer 
from sklearn import datasets 
import numpy as np 

newsgroups = datasets.fetch_20newsgroups(
       subset='all', 
       categories=['alt.atheism', 'sci.space'] 
     ) 
X = newsgroups.data 
y = newsgroups.target 

TD_IF = TfidfVectorizer() 
X_scaled = TD_IF.fit_transform(X, y) 
grid = {'C': np.power(10.0, np.arange(-1, 1))} 
cv = KFold(y_scaled.size, n_folds=5, shuffle=True, random_state=241) 
clf = SVC(kernel='linear', random_state=241) 

gs = GridSearchCV(estimator=clf, param_grid=grid, scoring='accuracy', cv=cv) 
gs.fit(X_scaled, y) 
+0

謝謝,你幫了很多!所以虛假的錯誤=) –