2017-10-15 193 views
0

我在python中創建了XGBoost分類器。我試圖做GridSearch找到這樣XGBoost模型上的GridSearchCV給出錯誤

grid_search = GridSearchCV(model, param_grid, scoring="neg_log_loss", n_jobs=-1, cv=kfold) 
grid_result = grid_search.fit(X, Y) 

print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_)) 

means = grid_result.cv_results_['mean_test_score'] 
stds = grid_result.cv_results_['std_test_score'] 
params = grid_result.cv_results_['params'] 

for mean, stdev, param in zip(means, stds, params): 
    print("%f (%f) with: %r" % (mean, stdev, param)) 

最佳參數當運行搜索,我得到這樣的錯誤

[Errno 28] No space left on device 

我用了一個稍微大尺寸的數據集。其中, X.shape = (38932, 1002) Y.shape= (38932,)

問題是什麼?如何解決這個問題。?

這是因爲數據集對我的機器來說太大了。如果是的話,我該怎麼做才能在這個數據集上執行GridSearch。

+0

請或者通過提供樣品和形狀或鏈接到數據 – sgDysregulation

+0

我包括數據集的說明已編輯的問題,並添加了形狀 –

+0

這是一個類似的問題,你正在經歷:https://stackoverflow.com/a/6999259/1577947 – Jarad

回答

1

的錯誤指示共享內存不多了,這可能是因爲增加 kfolds的數量和/或調整使用即n_jobs會解決此問題 。這裏是使用xgboost

工作示例的線程數
import xgboost as xgb 
from sklearn.model_selection import GridSearchCV 
from sklearn import datasets 

clf = xgb.XGBClassifier() 
parameters = { 
    'n_estimators': [100, 250, 500], 
    'max_depth': [6, 9, 12], 
    'subsample': [0.9, 1.0], 
    'colsample_bytree': [0.9, 1.0], 
} 
bsn = datasets.load_iris() 
X, Y = bsn.data, bsn.target 
grid = GridSearchCV(clf, 
        parameters, n_jobs=4, 
        scoring="neg_log_loss", 
        cv=3) 

grid.fit(X, Y) 
print("Best: %f using %s" % (grid.best_score_, grid.best_params_)) 

means = grid.cv_results_['mean_test_score'] 
stds = grid.cv_results_['std_test_score'] 
params = grid.cv_results_['params'] 

for mean, stdev, param in zip(means, stds, params): 
    print("%f (%f) with: %r" % (mean, stdev, param)) 

的輸出是

Best: -0.121569 using {'colsample_bytree': 0.9, 'max_depth': 6, 'n_estimators': 100, 'subsample': 1.0} 
-0.126334 (0.080193) with: {'colsample_bytree': 0.9, 'max_depth': 6, 'n_estimators': 100, 'subsample': 0.9} 
-0.121569 (0.081561) with: {'colsample_bytree': 0.9, 'max_depth': 6, 'n_estimators': 100, 'subsample': 1.0} 
-0.139359 (0.075462) with: {'colsample_bytree': 0.9, 'max_depth': 6, 'n_estimators': 250, 'subsample': 0.9} 
-0.131887 (0.076174) with: {'colsample_bytree': 0.9, 'max_depth': 6, 'n_estimators': 250, 'subsample': 1.0} 
-0.148302 (0.074890) with: {'colsample_bytree': 0.9, 'max_depth': 6, 'n_estimators': 500, 'subsample': 0.9} 
-0.135973 (0.076167) with: {'colsample_bytree': 0.9, 'max_depth': 6, 'n_estimators': 500, 'subsample': 1.0} 
-0.126334 (0.080193) with: {'colsample_bytree': 0.9, 'max_depth': 9, 'n_estimators': 100, 'subsample': 0.9} 
-0.121569 (0.081561) with: {'colsample_bytree': 0.9, 'max_depth': 9, 'n_estimators': 100, 'subsample': 1.0} 
-0.139359 (0.075462) with: {'colsample_bytree': 0.9, 'max_depth': 9, 'n_estimators': 250, 'subsample': 0.9} 
-0.131887 (0.076174) with: {'colsample_bytree': 0.9, 'max_depth': 9, 'n_estimators': 250, 'subsample': 1.0} 
-0.148302 (0.074890) with: {'colsample_bytree': 0.9, 'max_depth': 9, 'n_estimators': 500, 'subsample': 0.9} 
-0.135973 (0.076167) with: {'colsample_bytree': 0.9, 'max_depth': 9, 'n_estimators': 500, 'subsample': 1.0} 
-0.126334 (0.080193) with: {'colsample_bytree': 0.9, 'max_depth': 12, 'n_estimators': 100, 'subsample': 0.9} 
-0.121569 (0.081561) with: {'colsample_bytree': 0.9, 'max_depth': 12, 'n_estimators': 100, 'subsample': 1.0} 
-0.139359 (0.075462) with: {'colsample_bytree': 0.9, 'max_depth': 12, 'n_estimators': 250, 'subsample': 0.9} 
-0.131887 (0.076174) with: {'colsample_bytree': 0.9, 'max_depth': 12, 'n_estimators': 250, 'subsample': 1.0} 
-0.148302 (0.074890) with: {'colsample_bytree': 0.9, 'max_depth': 12, 'n_estimators': 500, 'subsample': 0.9} 
-0.135973 (0.076167) with: {'colsample_bytree': 0.9, 'max_depth': 12, 'n_estimators': 500, 'subsample': 1.0} 
-0.132745 (0.080433) with: {'colsample_bytree': 1.0, 'max_depth': 6, 'n_estimators': 100, 'subsample': 0.9} 
-0.127030 (0.077692) with: {'colsample_bytree': 1.0, 'max_depth': 6, 'n_estimators': 100, 'subsample': 1.0} 
-0.146143 (0.077623) with: {'colsample_bytree': 1.0, 'max_depth': 6, 'n_estimators': 250, 'subsample': 0.9} 
-0.140400 (0.074645) with: {'colsample_bytree': 1.0, 'max_depth': 6, 'n_estimators': 250, 'subsample': 1.0} 
-0.153624 (0.077594) with: {'colsample_bytree': 1.0, 'max_depth': 6, 'n_estimators': 500, 'subsample': 0.9} 
-0.143833 (0.073645) with: {'colsample_bytree': 1.0, 'max_depth': 6, 'n_estimators': 500, 'subsample': 1.0} 
-0.132745 (0.080433) with: {'colsample_bytree': 1.0, 'max_depth': 9, ... 
+0

我已經在我的機器上成功運行CVSearch之前。我只面對這個數據集的問題。 –

+0

我會嘗試不使用'kFold',並讓您知道它是怎麼回事 –

+0

您可能還想打開gridsearch中的詳細內容(即verbose = 5)來查看某些參數值是否導致問題。 – sgDysregulation