我想繪製刪除樣本(行)的效果。有人稱之爲「學習曲線」。如何發送數據幀到scikit進行交叉驗證?
所以我想使用熊貓來刪除一些行。 How to remove, randomly, rows from a dataframe but from each label?
但是,當我想要做的交叉驗證,我得到以下錯誤(即使使用df.values
把數據框到一個數組後):
所以,我是什麼做錯了?
這裏是我的代碼:
import pandas as pd
import numpy as np
from sklearn.model_selection import StratifiedShuffleSplit
from sklearn import neighbors
from sklearn import cross_validation
df = pd.DataFrame(np.random.rand(12, 5))
label = np.array([1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3])
df['label'] = label
df1 = pd.concat(g.sample(2) for idx, g in df.groupby('label'))
X = df1[[0, 1, 2, 3, 4]].values
y = df1.label.values
print(X)
print(y)
clf = neighbors.KNeighborsClassifier()
sss = StratifiedShuffleSplit(1, test_size=0.1)
scoresSSS = cross_validation.cross_val_score(clf, X, y, cv=sss)
print(scoresSSS)