在Python中爲XGBoost指定tree_method參數

我正在Python中使用XGBoost（PyPl上最新版本：0.6）的預測模型，並且已經開發了大約一半數據的訓練。現在我有了最後的模型，但是我得到了所有數據的訓練結果，但是得到了以前從未見過的信息：在Python中爲XGBoost指定tree_method參數

樹方法自動選擇爲「近似」以獲得更快的速度。使用舊的行爲（單臺機器上確切貪婪算法），設置 tree_method到「精確」」

作爲reproduceable例如，下面的代碼也產生我的機器上的消息：

import numpy as np 
import xgboost as xgb 

rows = 10**7 
cols = 20 
X = np.random.randint(0, 100, (rows, cols))  
y = np.random.randint(0,2, size=rows) 

clf = xgb.XGBClassifier(max_depth=5) 
clf.fit(X,y)

我試過在初始化和fit()步我的模型的設定既要tree_method「精確」，但每個引發錯誤：

import xgboost as xgb 
clf = xgb.XGBClassifier(tree_method = 'exact') 
clf 
> __init__() got an unexpected keyword argument 'tree_method' 


my_pipeline.fit(X_train, Y_train, clf__tree_method='exact') 
> self._final_estimator.fit(Xt, y, **fit_params) TypeError: fit() got an 
> unexpected keyword argument 'tree_method'

如何在Python中使用XGBoost指定tree_method ='exact'？

來源

2017-05-18 Max Power

綜觀[Python文檔]（https://xgboost.readthedocs.io/en/latest/python/python_api.html# module-xgboost.core），我找不到任何名爲'tree_method'的參數。 –

根據XGBoost parameter documentation，這是因爲tree_method的默認值爲「auto」。「自動」設置取決於數據：對於「中小型」數據，它將使用「精確」方法，對於「非常大」數據集，它將使用「近似」。當您開始使用整套訓練集（而不是50％）時，您必須越過了改變tree_method的自動值的訓練大小閾值。目前還不清楚需要多少觀察才能達到該閾值，但似乎在5到10萬行之間（因爲您有rows = 10**7）。

我不知道tree_method參數是否暴露在XGBoost Python模塊中（聽起來好像不是這樣，所以可能提交錯誤報告？），但tree_method暴露在R API中。

該文檔描述爲什麼你看到的警告消息：

來源

2017-07-25 20:33:51

在Python中爲XGBoost指定tree_method參數

回答

相關問題