XGBoost以及稀疏矩陣

我試圖用xgboost運行 - 使用蟒蛇 - 一種分類問題，在這裏我有一個numpy的矩陣X（行=觀察&列=功能）中的數據，並在標籤numpy array y。因爲我的數據很稀疏，所以我想使用稀疏版本的X來運行它，但是似乎我錯過了某些內容，因爲發生了錯誤。XGBoost以及稀疏矩陣

這裏是我做的：

# Library import 

import numpy as np 
import xgboost as xgb 
from xgboost.sklearn import XGBClassifier 
from scipy.sparse import csr_matrix 

# Converting to sparse data and running xgboost 

X_csr = csr_matrix(X) 
xgb1 = XGBClassifier() 
xgtrain = xgb.DMatrix(X_csr, label = y)  #to work with the xgb format 
xgtest = xgb.DMatrix(Xtest_csr) 
xgb1.fit(xgtrain, y, eval_metric='auc') 
dtrain_predictions = xgb1.predict(xgtest)

等等

現在，我得到試圖將分類時的錯誤：

File ".../xgboost/python-package/xgboost/sklearn.py", line 432, in fit 
self._features_count = X.shape[1] 

AttributeError: 'DMatrix' object has no attribute 'shape'

現在，我找了一個儘管它可能來自哪裏，並且相信它與我希望使用的稀疏格式有關。但是它是什麼，以及我如何修復它，我不知道。

我會歡迎任何幫助或意見！非常感謝

來源

2016-11-26 PLV

是否與'X'這項工作？ 'xgb'是關於使用稀疏矩陣的說法？他們通常不會減少替換。 – hpaulj

X_csr = csr_matrix(X)有許多與X相同的屬性，包括.shape。但它不是一個子類，也不是替代品的下降。代碼需要'稀疏感知'。 sklearn符合資格;實際上它增加了一些自己的快速稀疏效用函數。

但我不知道xgb如何處理稀疏矩陣，也不知道它如何與sklearn一起玩。

假設問題出在xgtrain，您需要查看它的類型和屬性。它是如何與xgb.DMatrix(X, label = y)做比較的？

如果您需要某個不是xgboost用戶的人的幫助，則必須提供更多有關代碼中對象的信息。

來源

2016-11-26 20:45:16 hpaulj

我更喜歡使用XGBoost培訓包裝，而不是XGBoost sklearn包裝。您可以按如下方式創建一個分類：

params = { 
    # I'm assuming you are doing binary classification 
    'objective':'binary:logistic' 
    # any other training params here 
    # full parameter list here https://github.com/dmlc/xgboost/blob/master/doc/parameter.md 
} 
booster = xgb.train(params, xgtrain, metrics=['auc'])

這個API也有一個內置的交叉驗證xgb.cv，工程與XGBoost要好得多。

https://xgboost.readthedocs.io/en/latest/get_started/index.html#python

噸以上的例子在這裏https://github.com/dmlc/xgboost/tree/master/demo/guide-python

希望這有助於。

來源

2017-05-31 14:22:15 volker238

您正在使用xgboost scikit-learn API（http://xgboost.readthedocs.io/en/latest/python/python_api.html#module-xgboost.sklearn），因此您不需要將數據轉換爲DMatrix以適合XGBClassifier（）。只刪除行

xgtrain = xgb.DMatrix(X_csr, label = y)

應該工作：

type(X_csr) #scipy.sparse.csr.csr_matrix 
type(y) #numpy.ndarray 
xgb1 = xgb.XGBClassifier() 
xgb1.fit(X_csr, y)

，輸出：

XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1, 
    gamma=0, learning_rate=0.1, max_delta_step=0, max_depth=3, 
    min_child_weight=1, missing=None, n_estimators=100, nthread=-1, 
    objective='binary:logistic', reg_alpha=0, reg_lambda=1, 
    scale_pos_weight=1, seed=0, silent=True, subsample=1)

來源

2017-10-30 09:05:23

XGBoost以及稀疏矩陣

回答

相關問題