2017-02-17 166 views
1

我想問一下是否有可能做「Startified GroupShuffleSplit」在scikit學習是換言之和GroupShuffleSplit組合StratifiedShuffleSplitStartified GroupShuffleSplit在Scikit學習

這裏是代碼的樣本我使用:

cv=GroupShuffleSplit(n_splits=n_splits,test_size=test_size,\ 
    train_size=train_size,random_state=random_state).split(\ 
    allr_sets_nor[:,:2],allr_labels,groups=allr_groups) 
opt=GridSearchCV(SVC(decision_function_shape=dfs,tol=tol),\ 
    param_grid=param_grid,scoring=scoring,n_jobs=n_jobs,cv=cv,verbose=verbose) 
opt.fit(allr_sets_nor[:,:2],allr_labels) 

這裏我申請GroupShuffleSplit但我還是想根據allr_labels

+0

StratifiedShuffleSplit如果你想要的話,也有一個參數組。只是使用Stratifiedshufflesplit將allr_labels和適合在GridSearchCV通過組到fit()方法 –

+0

它不適用於我不幸的是,我認爲這個選項是無效的,因爲它在文檔中說:「始終忽略,爲兼容性而存在。」 「 –

回答

3

添加startification我通過在應用StratifiedShuffleSplit解決了這個問題的組和然後找到訓練和手動測試集的索引,因爲它們連接到基團指數(在我的情況下,每個組包含從6*index6*index+5 6個連續集合)

如以下:

sss=StratifiedShuffleSplit(n_splits=n_splits,test_size=test_size, 
    train_size=train_size,random_state=random_state).split(all_groups,all_labels) 
     # startified splitting for groups only 

i=0 
train_is = [np.array([],dtype=int)]*n_splits 
test_is = [np.array([],dtype=int)]*n_splits 
for train_index,test_index in sss : 
     # finding the corresponding indices of reflected training and testing sets 
    train_is[i]=np.hstack((train_is[i],np.concatenate([train_index*6+i for i in range(6)]))) 
    test_is[i]=np.hstack((test_is[i],np.concatenate([test_index*6+i for i in range(6)]))) 
    i=i+1 

cv=[(train_is[i],test_is[i]) for i in range(n_splits)] 
     # constructing the final cross-validation iterable: list of 'n_splits' tuples; 
     # each tuple contains two numpy arrays for training and testing indices respectively 

opt=GridSearchCV(SVC(decision_function_shape=dfs,tol=tol),param_grid=param_grid, 
       scoring=scoring,n_jobs=n_jobs,cv=cv,verbose=verbose) 
opt.fit(allr_sets_nor[:,:2],allr_labels)