2016-05-12 54 views
0

xgboost xgb.dump tree coefficient提問。如何使用xgboost R樹轉儲來計算或執行預測?

我特別想知道如果eta = 0.1或0.01概率計算與提供的答案有何不同?

我想使用樹轉儲進行預測。

我的代碼是

#Define train label and feature frames/matrix 
y <- train_data$esc_ind 
train_data = as.matrix(train_data) 
trainX <- as.matrix(train_data[,-1]) 
param <- list("objective" = "binary:logistic", 
       "eval_metric" = "logloss", 
       "eta" = 0.5, 
       "max_depth" = 2, 
       "colsample_bytree" = .8, 
       "subsample" = 0.8, #0.75 
       "alpha" = 1 

) 

#Train XGBoost 
bst = xgboost(param=param, data = trainX, label = y, nrounds=2) 

trainX1 = data.frame(trainX) 
mpg.fmap = genFMap(trainX1, "xgboost.fmap") 
xgb.save(bst, "xgboost.model") 
xgb.dump(bst, "xgboost.model_6.txt",with.stats = TRUE, fmap = "xgboost.fmap") 

樹的樣子:

booster[0] 
0:[order.1<12.2496] yes=1,no=2,missing=2,gain=1359.61,cover=7215.25 
    1:[access.1<0.196687] yes=3,no=4,missing=4,gain=3.19685,cover=103.25 
     3:leaf=-0,cover=1 
     4:leaf=0.898305,cover=102.25 
    2:[team<6.46722] yes=5,no=6,missing=6,gain=753.317,cover=7112 
     5:leaf=0.893333,cover=55.25 
     6:leaf=-0.943396,cover=7056.75 
booster[1] 
0:[issu.1<6.4512] yes=1,no=2,missing=2,gain=794.308,cover=5836.81 
    1:[team<3.23361] yes=3,no=4,missing=4,gain=18.6294,cover=67.9586 
     3:leaf=0.609363,cover=21.4575 
     4:leaf=1.28181,cover=46.5012 
    2:[case<6.74709] yes=5,no=6,missing=6,gain=508.34,cover=5768.85 
     5:leaf=1.15253,cover=39.2126 
     6:leaf=-0.629773,cover=5729.64 

將爲所有樹葉分數xgboost係數爲1時ETA選擇小於1?

+0

請檢查我的答案在下面的鏈接 - 可能會有用 - http://stackoverflow.com/questions/39858916/xgboost-how-to-get-probabilities-of-class-from-xgb-dump-multisoftprob- objecti/40632862#40632862 – Run2

回答

0

其實這是我早些時候監督的實用。

使用上面的樹結構可以找到每個訓練樣例的概率。

參數列表是:

param <- list("objective" = "binary:logistic", 
       "eval_metric" = "logloss", 
       "eta" = 0.5, 
       "max_depth" = 2, 
       "colsample_bytree" = .8, 
       "subsample" = 0.8, 
       "alpha" = 1) 

對於例如在葉助力設定[0],葉:0-3;概率將是exp(-0)/(1 + exp(-0))。

而對於助推器[0],葉子:0-3 +助推器[1],葉子:0-3;概率將是exp(0 + 0.609363)/(1 + exp(0 + 0.609363))。

等等,隨着越來越多的迭代。

我將這些值與R的預測概率相匹配,它們在10 ^( - 7)之間有所不同,這可能是由於葉子質量分數的浮點縮減。

當R的訓練增強樹被用於不同環境中進行預測時,此答案可以給出生產級解決方案。

對此的任何評論將不勝感激。