2017-09-07 53 views
1

我正在運行h2o.automl()示例來自:http://h2o-release.s3.amazonaws.com/h2o/master/3888/docs-website/h2o-docs/automl.html。除NaN的值在leaderboard以外,一切都很順利。預測也很好。這是一個錯誤還是我做錯了什麼?h2o.automl:leaderboerd中的NaN值

library(h2o) 

localH2O <- h2o.init(ip = "localhost", 
       port = 54321, 
       nthreads = -1, 
       min_mem_size = "20g") 

train <- h2o.importFile("https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv") 
test <- h2o.importFile("https://s3.amazonaws.com/erin-data/higgs/higgs_test_5k.csv") 

y <- "response" 
x <- setdiff(names(train), y) 

train[,y] <- as.factor(train[,y]) 
test[,y] <- as.factor(test[,y]) 

aml <- h2o.automl(x = x, y = y, 
       training_frame = train, 
       leaderboard_frame = test, 
       max_runtime_secs = 30) 

lb <- [email protected] 
lb 

            model_id auc logloss 
1 StackedEnsemble_0_AutoML_20170908_094736 NaN  NaN 
2 StackedEnsemble_0_AutoML_20170908_094407 NaN  NaN 
3 GBM_grid_0_AutoML_20170908_094736_model_1 NaN  NaN 
4 GBM_grid_0_AutoML_20170908_094407_model_0 NaN  NaN 
5 GBM_grid_0_AutoML_20170908_094407_model_1 NaN  NaN 
6 GBM_grid_0_AutoML_20170908_094736_model_0 NaN  NaN 

我檢查,並有在水流量正常值上localhost:54321也是我得到使用h2o.getFrame()正常值:

h2o.getFrame("leaderboard") 
            model_id  auc logloss 
1 StackedEnsemble_0_AutoML_20170908_094736 0,787145 0,554983 
2 StackedEnsemble_0_AutoML_20170908_094407 0,785154 0,556897 
3 GBM_grid_0_AutoML_20170908_094736_model_1 0,778587 0,563741 
4 GBM_grid_0_AutoML_20170908_094407_model_0 0,776755 0,564247 
5 GBM_grid_0_AutoML_20170908_094407_model_1 0,776640 0,564436 
6 GBM_grid_0_AutoML_20170908_094736_model_0 0,774611 0,566920 

我使用H2O v 3.15.0.4018

h2o.clusterInfo() 
R is connected to the H2O cluster: 
H2O cluster uptime:   2 hours 8 minutes 
H2O cluster version:  3.15.0.4018 
H2O cluster version age: 15 hours and 47 minutes 
H2O cluster name:   H2O_started_from_R_maju116_ozj558 
H2O cluster total nodes: 1 
H2O cluster total memory: 19.03 GB 
H2O cluster total cores: 8 
H2O cluster allowed cores: 8 
H2O cluster healthy:  TRUE 
H2O Connection ip:   localhost 
H2O Connection port:  54321 
H2O Connection proxy:  NA 
H2O Internal Security:  FALSE 
H2O API Extensions:   XGBoost, Algos, AutoML, Core V3, Core V4 
R Version:     R version 3.4.1 (2017-06-30) 

會議信息:

R version 3.4.1 (2017-06-30) 
Platform: x86_64-pc-linux-gnu (64-bit) 
Running under: Ubuntu 16.04.2 LTS 

Matrix products: default 
BLAS: /usr/lib/openblas-base/libblas.so.3 
LAPACK: /usr/lib/libopenblasp-r0.2.18.so 

locale: 
[1] LC_CTYPE=pl_PL.UTF-8  LC_NUMERIC=C    
[3] LC_TIME=pl_PL.UTF-8  LC_COLLATE=pl_PL.UTF-8  
[5] LC_MONETARY=pl_PL.UTF-8 LC_MESSAGES=pl_PL.UTF-8 
[7] LC_PAPER=pl_PL.UTF-8  LC_NAME=C     
[9] LC_ADDRESS=C    LC_TELEPHONE=C    
[11] LC_MEASUREMENT=pl_PL.UTF-8 LC_IDENTIFICATION=C  

attached base packages: 
[1] stats  graphics grDevices utils  datasets methods base  

other attached packages: 
[1] dplyr_0.7.2  purrr_0.2.3  readr_1.1.1  tidyr_0.7.1  

[5] tibble_1.3.4  ggplot2_2.2.1  tidyverse_1.1.1 h2oEnsemble_0.2.1 
[9] h2o_3.15.0.4018 

loaded via a namespace (and not attached): 
[1] Rcpp_0.12.12  cellranger_1.1.0 compiler_3.4.1 plyr_1.8.4  
[5] bindr_0.1  forcats_0.2.0 bitops_1.0-6  tools_3.4.1  

[9] lubridate_1.6.0 jsonlite_1.5  nlme_3.1-131  gtable_0.2.0  

[13] lattice_0.20-35 pkgconfig_2.0.1 rlang_0.1.2  psych_1.7.5  

[17] parallel_3.4.1 haven_1.1.0  bindrcpp_0.2  xml2_1.1.1  

[21] httr_1.3.1  stringr_1.2.0 hms_0.3   grid_3.4.1  

[25] glue_1.1.1  R6_2.2.2   readxl_1.0.0  foreign_0.8-69 

[29] modelr_0.1.1  reshape2_1.4.2 magrittr_1.5  scales_0.5.0  

[33] rvest_0.3.2  assertthat_0.2.0 mnormt_1.5-5  colorspace_1.3-2 
[37] stringi_1.1.5 lazyeval_0.2.0 munsell_0.4.3 RCurl_1.95-4.8 

[41] broom_0.4.2 
+0

我無法複製此。你可以把'h2o.clusterInfo'的結果添加到你的問題中嗎?謝謝。 –

+0

@BrandenMurray done – Maju116

+0

我無法複製這個。您在這裏使用夜間版本,您可以再次嘗試最新的穩定版本嗎? https://h2o-release.s3.amazonaws.com/h2o/latest_stable.html你也可以刪除'library(h2oEnsemble)',這是不需要的。 –

回答

3

只是一個預感,但嘗試在en_US語言環境中運行R.

如果解決了這個問題,我想象一下發生的是[email protected]h2o.getFrame("leaderboard")對浮點數字中的逗號造成阻塞,這就是NaN來自的地方。即顯示錯誤,而不是數據錯誤。

(如果不解決這個問題,它也可能是有用的知道,如果你同時運行H2O和R在同一pl_PL.UTF-8語言環境會發生什麼。)

+1

設定的正常值: 'Sys.setlocale(「LC_MESSAGES」,「EN_GB。 UTF-8') Sys.setenv(LANG =「en_US.UTF-8」)'工作! – Maju116