爲什麼使用scikit-learn的GradientBoostingRegressor從相同的輸入中獲得不同的輸出？

例如：爲什麼使用scikit-learn的GradientBoostingRegressor從相同的輸入中獲得不同的輸出？

params = {'n_estimators': 200, "max_depth": 4, 'subsample': 1, 'learning_rate': 0.1} 
boost = ensemble.GradientBoostingRegressor(**params) 
ghostBoost = ensemble.GradientBoostingRegressor(**params) 

... 

boost.fit(x, y) 
ghostBoost.fit(x, y) 

... 

predictionA = boost.predict(features) 
predictionB = ghostBoost.predict(features)

boost和ghostBoost是精確的相同，但predictionA不等於predictionB，爲什麼會出現這種情況？

來源

2014-02-18 Shane

嘗試將兩個模型的random_state構造函數參數修復爲相同的值。由於每個節點認爲max_features隨機抽取（~~，替換~~無替換），所以決策樹構建過程是隨機的。

編輯：特徵採樣完成後無需替換。當max_features=None（默認）評估所有功能時，但是當max_depth不是None時可能會產生影響的排序更改，並且目標變量具有導致綁定最佳功能拆分的非唯一值。

來源

2014-02-18 07:52:38 ogrisel

非常感謝！我強制'random_state' = 1，現在所有結果都是一致的，這會以任何方式影響性能嗎？ – Shane

不，它不應該影響性能。 – ogrisel

我剛剛注意到當輸入樣本的順序發生變化時，結果應該是相同的，實際上是不同的，請您也可以在http://stackoverflow.com/questions/22170677/how-什麼時候輸入樣本是變化的？ – Shane

爲什麼使用scikit-learn的GradientBoostingRegressor從相同的輸入中獲得不同的輸出？

回答

相關問題