2017-07-30 35 views
0

我正在學習Python機器學習的一些基礎知識(scikit-learn),當我嘗試實現K最近鄰算法時發生錯誤:ValueError:Found input具有不一致樣本數的變量:[426,143]。我不知道如何處理它。
這是我的代碼:「不一致的樣本數」 - scikit - learn

from sklearn.datasets import load_breast_cancer 
from sklearn.model_selection import train_test_split 
from sklearn.neighbors import KNeighborsClassifier 
cancer = load_breast_cancer() 
X_train, y_train, X_test, y_test = train_test_split(cancer.data,cancer.target, 
                stratify = 
                cancer.target, 
                random_state = 0) 
clf = KNeighborsClassifier(n_neighbors = 6) 
clf.fit(X_train, y_train)` 

回答

1

train_test_split返回X_train, X_test, y_train, y_test

您指定的返回值的錯誤變量的順序元組,所以你與訓練數據和測試數據,而不是擬合的訓練數據和訓練標籤。

應該

X_train, X_test, y_train, y_test = train_test_split() 
+0

它是如此簡單。我感到羞愧。謝謝 :) – Hendrra