只是想知道是如何可能的未來情況後,差異訓練和測試功能:XGBoost轉換爲DMatrix
def fit(self, train, target):
xgtrain = xgb.DMatrix(train, label=target, missing=np.nan)
self.model = xgb.train(self.params, xgtrain, self.num_rounds)
我通過訓練集爲csr_matrix有5233列,並且轉換成DMatrix我後獲得了5322個功能。
後來預測一步,我得到了一個錯誤如上錯誤:(
def predict(self, test):
if not self.model:
return -1
xgtest = xgb.DMatrix(test)
return self.model.predict(xgtest)
Error: ... training data did not have the following fields: f5232
我怎樣才能保證正確的轉換我火車/測試的原因數據集DMatrix ?
有沒有機會在Python中使用類似於R的東西?
# get same columns for test/train sparse matrixes
col_order <- intersect(colnames(X_train_sparse), colnames(X_test_sparse))
X_train_sparse <- X_train_sparse[,col_order]
X_test_sparse <- X_test_sparse[,col_order]
我的方法是行不通的,不幸的是:
def _normalize_columns(self):
columns = (set(self.xgtest.feature_names) - set(self.xgtrain.feature_names)) | \
(set(self.xgtrain.feature_names) - set(self.xgtest.feature_names))
for item in columns:
if item in self.xgtest.feature_names:
self.xgtest.feature_names.remove(item)
else:
# seems, it's immutable structure and can not add any new item!!!
self.xgtest.feature_names.append(item)