2014-03-31 65 views
0

我試圖對我的KNN分類器的結果進行交叉驗證。我使用了下面的代碼,它返回一個類型錯誤。SKLearn交叉驗證錯誤 - 類型錯誤

對於上下文,我已經導入了SciKit Learn,Numpy和Pandas庫。

from sklearn.cross_validation import cross_val_score, ShuffleSplit 

n_samples = len(y) 
knn = KNeighborsClassifier(3) 
cv = ShuffleSplit(n_samples, n_iter=10, test_size=0.3, random_state=0) 

test_scores = cross_val_score(knn, X, y, cv=cv) 
test_scores.mean() 

返回:

--------------------------------------------------------------------------- 
TypeError         Traceback (most recent call last) 
<ipython-input-139-d8cc3ee0c29b> in <module>() 
    7 cv = ShuffleSplit(n_samples, n_iter=10, test_size=0.3, random_state=0) 
    8 
    9 test_scores = cross_val_score(knn, X, y, cv=cv) 
10 test_scores.mean() 

//anaconda/lib/python2.7/site-packages/sklearn/cross_validation.pyc in  cross_val_score(estimator, X, y, scoring, cv, n_jobs, verbose, fit_params, score_func, pre_dispatch) 
1150   delayed(_cross_val_score)(clone(estimator), X, y, scorer, train, test, 
1151         verbose, fit_params) 
1152   for train, test in cv) 
1153  return np.array(scores) 
1154 

//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in __call__(self, iterable) 
515   try: 
516    for function, args, kwargs in iterable: 
517     self.dispatch(function, args, kwargs) 
518 
519    self.retrieve() 
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in dispatch(self, func, args, kwargs) 
310   """ 
311   if self._pool is None: 
312    job = ImmediateApply(func, args, kwargs) 
313    index = len(self._jobs) 
314    if not _verbosity_filter(index, self.verbose): 
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in __init__(self, func, args, kwargs) 
134   # Don't delay the application, to avoid keeping the input 
135   # arguments in memory 
136   self.results = func(*args, **kwargs) 
137 
138  def get(self): 

//anaconda/lib/python2.7/site-packages/sklearn/cross_validation.pyc in _cross_val_score(estimator, X, y, scorer, train, test, verbose, fit_params) 
1056   y_test = None 
1057  else: 
1058   y_train = y[train] 
1059   y_test = y[test] 
1060  estimator.fit(X_train, y_train, **fit_params) 

TypeError: only integer arrays with one element can be converted to an index 
+0

請指定您的y變量是否是從pandas.DataFrame派生的 – eickenberg

回答

1

這是與大熊貓的錯誤​​。 Scikit學習期望numpy數組,稀疏矩陣或行爲類似於這些的對象。

pandas DataFrames的主要問題是由於使用索引選擇列而不是行。通過DataFrame.loc [...]完成pandas中的行索引。這是sklearn的意外行爲。該錯誤可能來自1058行,代碼未能提取列車樣本。

爲了解決這個問題,如果你的y是一個數據框列,試試你的列轉換爲數組類型

y = y.values 

否則pandas-sklearn可能是一個選項。