0
這可能是一個愚蠢的問題,但是當我在R中使用H2O Predict函數時,我想知道是否有一種方法可以指定它保留評分數據中的一列或多列。具體而言,我想保留我的唯一ID密鑰。現在,我最終做了一個非常低效的方法,將原始數據集和一個索引鍵分配給分數,然後將分數合併到計分數據集中。我寧願說「評分這個數據集並保留x,y,z ....列」。有什麼建議?評分新數據集時保留ID密鑰(或任何其他列)?
低效的代碼:
#Use H2O predict function to score new data
NL2L_SCore_SetScored.hex = h2o.predict(object = best_gbm, newdata =
NL2L_SCore_Set.hex)
#Convert scores hex to data frame from H2O
NL2L_SCore_SetScored.df<-as.data.frame(NL2L_SCore_SetScored.hex)
#add index to the scores so we can merge the two datasets
NL2L_SCore_SetScored.df$ID <- seq.int(nrow(NL2L_SCore_SetScored.df))
#Convert orignal scoring set to data frame from H2O
NL2L_SCore_Set.df<-as.data.frame(NL2L_SCore_Set.hex)
#add index to original scoring data so we can merge the two datasets
NL2L_SCore_Set.df$ID <- seq.int(nrow(NL2L_SCore_Set.df))
#Then merge by newly created ID Key so I have the scores on my scoring data
#set. Ideally I wouldn't have to even create this key and could keep
#original Columns from the data set, which include the customer id key
Full_Scored_Set=inner_join(NL2L_SCore_Set.df,NL2L_SCore_Set.df, by="ID")