1
我一直在玩sklearn中的StratifiedKFold隨機狀態變量,但它似乎並不隨機。我相信設置random_state=5
,應該給我一個不同的測試集,然後設置random_state=4
,但這似乎並非如此。我在下面創建了一些原始可重現的代碼。首先我打開我的數據:sklearn隨機狀態不隨機
import numpy as np
from sklearn.cross_validation import StratifiedKFold
from sklearn import datasets
iris = datasets.load_iris()
X = iris.data
y = iris.target
然後我設置random_state=5
,爲此,我儲存最後值:
skf=StratifiedKFold(n_splits=5,random_state=5)
for (train, test) in skf.split(X,y): full_test_1=test
full_test_1
array([ 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 90, 91, 92,
93, 94, 95, 96, 97, 98, 99, 140, 141, 142, 143, 144, 145,
146, 147, 148, 149])
做同樣的程序random_state=4
:
skf=StratifiedKFold(n_splits=5,random_state=4)
for (train, test) in skf.split(X,y): full_test_2=test
full_test_2
array([ 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 90, 91, 92,
93, 94, 95, 96, 97, 98, 99, 140, 141, 142, 143, 144, 145,
146, 147, 148, 149])
我可以再檢查它們是否相等:
np.array_equal(full_test_1,full_test_2)
True
我不認爲這兩個隨機狀態應該返回相同的數字。我的邏輯或代碼有缺陷嗎?