2017-01-03 23 views
3

我想通過查找AUC或準確度來測量模型性能。在網格搜索中,我得到了residual deviance的結果,我怎麼能告訴h2o深度學習網格有AUC而不是殘餘偏差,並將結果呈現爲如下所示的結果?如何告訴h2o深度學習網格有AUC而不是殘餘偏差

train <- read.table(text = "target birds wolfs  snakes 
           0  9   7 a 
           0  8   4 b 
           1  2   8 c 
           1  2   3 a 
           1  8   3 a 
           0  1   2 a 
           0  7   1 b 
           0  1   5 c 
           1  9   7 c 
           1  8   7 c 
           0  2   7 b 
           1  2   3 b 
           1  6   3 c 
           0  1   1 a 
           0  3   9 a 
           1  1   1 b ",header = TRUE) 
trainHex <- as.h2o(train) 

g <- h2o.grid("deeplearning", 
       hyper_params = list(
        seed = c(123456789,12345678,1234567), 
        activation = c("Rectifier", "Tanh", "TanhWithDropout", "RectifierWithDropout", "Maxout", "MaxoutWithDropout") 
      ), 
       reproducible = TRUE, 
       x = 2:4, 
       y = 1, 
       training_frame = trainHex, 
       validation_frame = trainHex, 
       epochs = 50, 
      ) 
g 
model_ids <- [email protected]_table 
model_ids<-as.data.frame(model_ids) 

結果表,我得到:

 Hyper-Parameter Search Summary: ordered by increasing residual_deviance 
      activation  seed             model_ids residual_deviance 
1    Maxout 12345678 Grid_DeepLearning_train_model_R_1483217086840_112_model_10 0.07243775676256235 
2    Maxout 1234567 Grid_DeepLearning_train_model_R_1483217086840_112_model_16 0.10060885040861599 
3  MaxoutWithDropout 123456789 Grid_DeepLearning_train_model_R_1483217086840_112_model_5 0.1706496158406441 
4    Maxout 123456789 Grid_DeepLearning_train_model_R_1483217086840_112_model_4 0.17243125875659948 
5     Tanh 123456789 Grid_DeepLearning_train_model_R_1483217086840_112_model_1 0.18326527198894926 
6     Tanh 12345678 Grid_DeepLearning_train_model_R_1483217086840_112_model_7 0.18763395264761593 
7     Tanh 1234567 Grid_DeepLearning_train_model_R_1483217086840_112_model_13 0.18791531211136187 
8  TanhWithDropout 123456789 Grid_DeepLearning_train_model_R_1483217086840_112_model_2 0.19808063817007837 
9  TanhWithDropout 12345678 Grid_DeepLearning_train_model_R_1483217086840_112_model_8 0.19815190962052193 
10  TanhWithDropout 1234567 Grid_DeepLearning_train_model_R_1483217086840_112_model_14 0.19832946889767458 
11   Rectifier 123456789 Grid_DeepLearning_train_model_R_1483217086840_112_model_0 0.20679125165086842 
12 MaxoutWithDropout 1234567 Grid_DeepLearning_train_model_R_1483217086840_112_model_17 0.21971759565380736 
13 RectifierWithDropout 123456789 Grid_DeepLearning_train_model_R_1483217086840_112_model_3 0.22337599298253263 
14 MaxoutWithDropout 12345678 Grid_DeepLearning_train_model_R_1483217086840_112_model_11 0.22440661112729862 
15 RectifierWithDropout 1234567 Grid_DeepLearning_train_model_R_1483217086840_112_model_15 0.2284671685474275 
16 RectifierWithDropout 12345678 Grid_DeepLearning_train_model_R_1483217086840_112_model_9 0.23163744415703522 
17   Rectifier 1234567 Grid_DeepLearning_train_model_R_1483217086840_112_model_12 0.2516917276707789 
18   Rectifier 12345678 Grid_DeepLearning_train_model_R_1483217086840_112_model_6 0.2642221616447725 
+1

順便提一句,將'validation_frame'設置爲'training_frame'是默認行爲,所以不需要指定它。請注意,通過不使用驗證和測試數據集,您可以針對* over-fit *最佳的深度學習參數進行優化。我甚至不確定您對隨機種子對結果變化的影響所瞭解的情況可能適用於看不見的數據。 (當然,它仍然可能是一個有趣的實驗:例如,我之前完成了這個工作,看看需要幾個隱藏的節點/層/時代才能完美地適合數據。) –

回答

3

你可以用h2o.getGrid()做到這一點。繼續從您的示例代碼:

g_rmse <- h2o.getGrid([email protected]_id, "rmse") 
g_rmse #Output it 

我選擇了根MSE那裏。 AUC不適用於您的樣本數據:它必須是二項分類,並且您正在進行迴歸。

你做迴歸的原因是你的y包含0和1,所以H2O已經猜到它是數字。您需要在該列上使用as.factor(),然後將其上傳到H2O中。

train <- ... 
trainHex <- as.h2o(train) 
trainHex[,1] = as.factor(trainHex[,1]) #Add this 

g <- ... 

然後,你可以這樣做:

g_auc <- h2o.getGrid([email protected]_id, "auc", decreasing = TRUE) 
g_auc 

我已經將它設置爲decreasing=TRUE所以,最好的AUC是在頂部。

+0

非常感謝您的詳細解答@Darren Cook。 – mql4beginner