2
我有一個時間序列,每個月都有粒度和7個月的數據,我試圖通過前六個月的訓練來預測第7個月的盈利能力。我對數據做了80/20分割。 XGBoost提供的RMSE極低,我從其他算法中無法獲得。這讓我有點懷疑。所以我決定檢查哪些功能是最重要的,而不是功能列表中的數字。這讓我懷疑沒有正確地將數據提供給算法。我爲noob問題表示歉意,但我想我是一種。幫助將不勝感激。XGBoost輸入數據問題
require(caTools)
require(Matrix)
require(data.table)
require(xgboost)
set.seed(111)
sample = sample.split(new_flat$SUBSCRIPTION_ID, SplitRatio = .80)
train = subset(new_flat, sample == TRUE)
train <- subset(train, select = -SUBSCRIPTION_ID) #Removing Subscription_id
test = subset(new_flat, sample == FALSE)
test <- subset(test, select = -SUBSCRIPTION_ID) #Removing Subscription_id
target=test$Total_MARGIN_7 #Value I want to predict in the test set
dtrain <- xgb.DMatrix(data = as.matrix(train), label = train[,7])# I think this is the problem here
dtest <- xgb.DMatrix(data = as.matrix(test), label = test[,7]) ])# I think this is the problem here
bst <- xgboost(data = dtrain, max_depth = 5, eta = 1, nrounds = 20,
objective = "reg:linear")
pred <- predict(bst, dtest)
mean(pred)
RMSE <- sqrt(mean((as.numeric(target) - pred)^2)) # Yes as.numeric is redundant here
RMSE
我不知道,如果XG升壓是時間序列好的算法。你能顯示一些樣本數據嗎? –
您是否將功能編號作爲輸出或其他內容? –
我不能分享數據不幸的是,你可能是對的,xgboost可能不是最好的時間序列,但我只是試一試。 –