2017-09-15 80 views
1

我試圖在R(h2o_3.14.0.2)中運行H2O的異常檢測。如何在H2O-R中創建異常檢測模型

首先,我試圖用我的主深度學習模型,並得到了錯誤:

water.exceptions.H2OIllegalArgumentException 
[1] "water.exceptions.H2OIllegalArgumentException: Only for AutoEncoder Deep Learning model." 
... 

OK,我的壞。我已經設置autoencoderTRUE

h2o.deeplearning(y = response, training_frame = training.frame, validation_frame = test.frame, autoencoder = TRUE) 

,並獲得新的錯誤:

Error in .verify_dataxy(training_frame, x, y, autoencoder): `y` should not be specified for autoencoder=TRUE, remove `y` input 
Traceback: 

1. h2o.deeplearning(y = response, training_frame = training.frame, 
.  validation_frame = test.frame, autoencoder = TRUE) 
2. .verify_dataxy(training_frame, x, y, autoencoder) 
3. stop("`y` should not be specified for autoencoder=TRUE, remove `y` input") 

OK,所以我應該已經刪除y

h2o.deeplearning(training_frame = training.frame, validation_frame = test.frame, autoencoder = TRUE) 

但是:

Error in is.numeric(y): argument "y" is missing, with no default 
Traceback: 

1. h2o.deeplearning(training_frame = training.frame, validation_frame = test.frame, 
.  autoencoder = TRUE) 
2. is.numeric(y) 

嗯,最後兩個要求看起來相互排斥。但是OK,我會嘗試另一種模式:

anomaly.detection.model <- h2o.glrm(training_frame = training.frame, k = 10, seed = common.seed) 

h2o.anomaly(anomaly.detection.model, training.frame, per_feature = FALSE) 

並獲得另一種類型的錯誤:

java.lang.AssertionError 
[1] "java.lang.AssertionError"                      
[2] " water.api.ModelMetricsHandler.predict(ModelMetricsHandler.java:439)" 
... 

失敗的斷言是assert s.reconstruct_train;。還沒有挖掘它。也許我會運氣與GBM或RF?

model = h2o.gbm(y = response, 
       training_frame = training.frame, 
       validation_frame = validation.frame, 
       max_hit_ratio_k = 10, 
       seed = common.seed, 
       stopping_rounds = 3, 
       stopping_tolerance = 1e-2) 

h2o.anomaly(model, training.frame, per_feature = FALSE) 

water.exceptions.H2OIllegalArgumentException 
[1] "water.exceptions.H2OIllegalArgumentException: Requires a Deep Learning, GLRM, DRF or GBM model." 

與同爲RF。

所以我有兩個問題:

  1. 如何檢測異常?
  2. 這些是錯誤還是我做錯了什麼?

回答

0

啓用autoencoder(如真)變成聚類問題,因此不需要設置響應(y)。

此外,當autoencoder設置爲TRUE時,您仍然需要設置x。上面用autoencoder看到的問題是TRUE,你沒有設置預測器(x)。一旦你設置了x,你的問題就會消失。

下面是我用H2O 3.14.0.2 R上運行快速異常檢測測試(詳情請參閱這篇blog):

> library(h2o) 
    > h2o.init() 
    Reading in config file: ./.h2oconfig 

    H2O is not running yet, starting it now... 

    Note: In case of errors look at the following log files: 
     /var/folders/x7/331tvwcd6p17jj9zdmhnkpyc0000gn/T//Rtmp7RuYKp/h2o_avkashchauhan_started_from_r.out 
     /var/folders/x7/331tvwcd6p17jj9zdmhnkpyc0000gn/T//Rtmp7RuYKp/h2o_avkashchauhan_started_from_r.err 

    java version "1.8.0_101" 
    Java(TM) SE Runtime Environment (build 1.8.0_101-b13) 
    Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode) 

    Starting H2O JVM and connecting: .. Connection successful! 

    R is connected to the H2O cluster: 
     H2O cluster uptime:   1 seconds 948 milliseconds 
     H2O cluster version:  3.14.0.2 
     H2O cluster version age: 24 days 
     H2O cluster name:   H2O_started_from_R_avkashchauhan_alj381 
     H2O cluster total nodes: 1 
     H2O cluster total memory: 3.56 GB 
     H2O cluster total cores: 8 
     H2O cluster allowed cores: 8 
     H2O cluster healthy:  TRUE 
     H2O Connection ip:   localhost 
     H2O Connection port:  54321 
     H2O Connection proxy:  NA 
     H2O Internal Security:  FALSE 
     H2O API Extensions:   XGBoost, Algos, AutoML, Core V3, Core V4 
     R Version:     R version 3.4.0 (2017-04-21) 

    > mtcar = h2o.importFile('https://raw.githubusercontent.com/woobe/H2O_London_Workshop/master/data/auto_design.csv') 
    |==================================================================================================================================| 100% 
    > mtcar$gear = as.factor(mtcar$gear) 
    > mtcar$carb = as.factor(mtcar$carb) 
    > mtcar$cyl = as.factor(mtcar$cyl) 
    > mtcar$vs = as.factor(mtcar$vs) 
    > mtcar$am = as.factor(mtcar$am) 
    > mtcar.dl = h2o.deeplearning(x = 2:12, training_frame = mtcar, autoencoder = TRUE, hidden = c(1,1,1), epochs = 100,seed=1) 
    |==================================================================================================================================| 100% 
    > errors <- h2o.anomaly(mtcar.dl, mtcar, per_feature = TRUE) 
    > print(errors) 
    reconstr_carb.1.SE reconstr_carb.2.SE reconstr_carb.3.SE reconstr_carb.4.SE reconstr_carb.6.SE reconstr_carb.8.SE 
    1     0     0     0     1     0     0 
    2     0     0     0     1     0     0 
    3     1     0     0     0     0     0 
    4     1     0     0     0     0     0 
    5     0     1     0     0     0     0 
    6     1     0     0     0     0     0 
    reconstr_carb.missing(NA).SE reconstr_cyl.4.SE reconstr_cyl.6.SE reconstr_cyl.8.SE reconstr_cyl.10.SE reconstr_cyl.missing(NA).SE 
    1       0     0     1     0     0       0 
    2       0     0     1     0     0       0 
    3       0     1     0     0     0       0 
    4       0     0     1     0     0       0 
    5       0     0     0     1     0       0 
    6       0     0     1     0     0       0 
    reconstr_gear.3.SE reconstr_gear.4.SE reconstr_gear.5.SE reconstr_gear.missing(NA).SE reconstr_vs.0.SE reconstr_vs.1.SE 
    1     0     1     0       0    1    0 
    2     0     1     0       0    1    0 
    3     0     1     0       0    0    1 
    4     1     0     0       0    0    1 
    5     1     0     0       0    1    0 
    6     1     0     0       0    0    1 
    reconstr_vs.missing(NA).SE reconstr_am.0.SE reconstr_am.1.SE reconstr_am.missing(NA).SE reconstr_mpg.SE reconstr_disp.SE reconstr_hp.SE 
    1       0    0    1       0 8.705556e-05  0.0196626269 0.0035177471 
    2       0    0    1       0 8.705556e-05  0.0196626269 0.0035177471 
    3       0    0    1       0 2.684331e-04  0.0411916382 0.0045768080 
    4       0    1    0       0 1.307597e-05  0.0004837585 0.0035177471 
    5       0    1    0       0 1.779785e-03  0.0102131519 0.0007516691 
    6       0    1    0       0 2.576469e-03  0.0038200199 0.0038147898 
    reconstr_drat.SE reconstr_wt.SE reconstr_qsec.SE 
    1  0.002147682 0.002080628  0.003914459 
    2  0.002147682 0.002054817  0.003843678 
    3  0.002153499 0.002111200  0.003646228 
    4  0.002244072 0.002020654  0.003545225 
    5  0.002235761 0.001998203  0.003843678 
    6  0.002282261 0.001996213  0.003451600 

    [32 rows x 28 columns] 

你也可以做GLRM對同一數據集如下,你必須設置k,並且不需要將GL傳遞給GLRM,但是數據集不能有恆定的列。這就是爲什麼我在深度學習中使用GLRM過濾的數據集。

> mtcar_glrm = mtcar[2:12] 
> mtcar.glrm = h2o.glrm(training_frame = mtcar_glrm,seed=1, k = 5) 
+0

謝謝! 雖然錯誤消息應該更具描述性。 –

1

我試圖自己檢測時間序列數據的異常。要學習我使用這個概念blog。這個博客中的解釋對我很好。

我希望能夠提供一些視覺表示,當我們檢測到異常時,會發生什麼。 在此示例中,Deep Learning模型適合於此ECG數據集。數據看起來身體像這樣:

Data we fit our Deep Learning Model

之後,我們提供的測試數據集(包含異常),這將是這樣的: Data we test our Deep Learning Model on

異常檢測本身就是在可能的情況「人工智能」看到方誤差差使用公制MSE或平均

This is what AI 'see' on Test dataset

生成的MSE可以b e如示例

MSE output