2017-10-12 82 views
1

我目前正在研究一個旨在預測二進制類的機器學習項目(負數:0,正數:1)。數據集不平衡。正值比例爲0.1%。Python XgBoost對不平衡類

我正在使用gini作爲我的性能指標運行xgboost模型。 問題是,升壓迭代期間,它需要大量的奔跑不斷提高得分

例:

[Fold 1/2] 
[0] train-gini:-0.048192 validation-gini:-0.042979 
Multiple eval metrics have been passed: 'validation-gini' will be used for early stopping. 

Will train until validation-gini hasn't improved in 200 rounds. 
[10] train-gini:-0.048192 validation-gini:-0.042979 
[20] train-gini:-0.048192 validation-gini:-0.042979 
[30] train-gini:-0.048192 validation-gini:-0.042979 
[40] train-gini:-0.048192 validation-gini:-0.042979 
[50] train-gini:-0.048192 validation-gini:-0.042979 
[60] train-gini:-0.048192 validation-gini:-0.042979 
[70] train-gini:-0.048192 validation-gini:-0.042979 
[80] train-gini:-0.048192 validation-gini:-0.042979 
[90] train-gini:0.197521 validation-gini:0.114222 
[100] train-gini:0.247692 validation-gini:0.150601 
[110] train-gini:0.2742 validation-gini:0.169023 
[120] train-gini:0.278983 validation-gini:0.168095 
[130] train-gini:0.316636 validation-gini:0.19118 
[140] train-gini:0.347296 validation-gini:0.191045 
[150] train-gini:0.368581 validation-gini:0.20094 
[160] train-gini:0.374773 validation-gini:0.20906 
[170] train-gini:0.398815 validation-gini:0.215193 
[180] train-gini:0.426088 validation-gini:0.220467 
[190] train-gini:0.439271 validation-gini:0.22249 
[200] train-gini:0.455897 validation-gini:0.226621 
[210] train-gini:0.469989 validation-gini:0.229512 
[220] train-gini:0.485784 validation-gini:0.233432 
[230] train-gini:0.496734 validation-gini:0.23747 
[240] train-gini:0.503718 validation-gini:0.241804 
[250] train-gini:0.51102 validation-gini:0.241841 
[260] train-gini:0.523444 validation-gini:0.244312 
[270] train-gini:0.530968 validation-gini:0.245467 
[280] train-gini:0.538703 validation-gini:0.247433 
[290] train-gini:0.546911 validation-gini:0.244196 
[300] train-gini:0.553623 validation-gini:0.244161 
[310] train-gini:0.561385 validation-gini:0.245099 
[320] train-gini:0.571532 validation-gini:0.244787 
[330] train-gini:0.578088 validation-gini:0.246146 
[340] train-gini:0.585054 validation-gini:0.245624 
[350] train-gini:0.591924 validation-gini:0.245463 
[360] train-gini:0.596331 validation-gini:0.247517 
[370] train-gini:0.600661 validation-gini:0.249465 
[380] train-gini:0.606264 validation-gini:0.249034 
[390] train-gini:0.611768 validation-gini:0.249182 
[400] train-gini:0.617176 validation-gini:0.248239 
[410] train-gini:0.621629 validation-gini:0.249248 
[420] train-gini:0.626766 validation-gini:0.24975 
[430] train-gini:0.631587 validation-gini:0.247824 
[440] train-gini:0.636737 validation-gini:0.246586 
[450] train-gini:0.641735 validation-gini:0.246552 
[460] train-gini:0.649765 validation-gini:0.246332 
[470] train-gini:0.654319 validation-gini:0.243546 
[480] train-gini:0.659301 validation-gini:0.241965 
[490] train-gini:0.665632 validation-gini:0.242562 
[500] train-gini:0.669333 validation-gini:0.241306 
[510] train-gini:0.673625 validation-gini:0.240314 
[520] train-gini:0.678935 validation-gini:0.239846 
[530] train-gini:0.683851 validation-gini:0.240029 
[540] train-gini:0.685694 validation-gini:0.240691 
[550] train-gini:0.689285 validation-gini:0.239974 
[560] train-gini:0.691698 validation-gini:0.239079 
[570] train-gini:0.694017 validation-gini:0.239407 
Stopping. Best iteration: 
[373] train-gini:0.60227 validation-gini:0.24996 

我們可以看到,在第二輪80分的火車和驗證終於提高。即使我改變了我的分組的種子(但是分數增加將發生變化的那一輪的n°),這種情況也會重複。

有沒有人遇到過這種問題?

乾杯, ASTRUS

回答

0

都能跟得上。但只有0.1%的正值,你可能想嘗試xgboost參數的值

也許它會解決這個問題。我會去用:

scale_pos_weight = 1000 
0

您是否嘗試過改變你的eval_metric要麼loglosserror按xgboost documentation