我正在嘗試使用h2o
針對數據集的不同部分針對2種算法(random forest
和gbm
)運行優化網格。我的代碼看起來像R H2O連接(內存)問題
for (...)
{
read data
# setup h2o cluster
h2o <- h2o.init(ip = "localhost", port = 54321, nthreads = detectCores()-1)
gbm.grid <- h2o.grid("gbm", grid_id = "gbm.grid", x = names(td.train.h2o)[!names(td.train.h2o)%like%segment_binary], y = segment_binary,
seed = 42, distribution = "bernoulli",
training_frame = td.train.h2o, validation_frame = td.train.hyper.h2o,
hyper_params = hyper_params, search_criteria = search_criteria)
# shutdown h2o
h2o.shutdown(prompt = FALSE)
# setup h2o cluster
h2o <- h2o.init(ip = "localhost", port = 54321, nthreads = detectCores()-1)
rf.grid <- h2o.grid("randomForest", grid_id = "rf.grid", x = names(td.train.h2o)[!names(td.train.h2o)%like%segment_binary], y = segment_binary,
seed = 42, distribution = "bernoulli",
training_frame = td.train.h2o, validation_frame = td.train.hyper.h2o,
hyper_params = hyper_params, search_criteria = search_criteria)
h2o.shutdown(prompt = FALSE)
}
的問題是,如果我運行在一個for loop
去,我得到的錯誤
Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = urlSuffix, :
Unexpected CURL error: Failed to connect to localhost port 54321: Connection refused
PS:我使用的是線
# shutdown h2o
h2o.shutdown(prompt = FALSE)
# setup h2o cluster
h2o <- h2o.init(ip = "localhost", port = 54321, nthreads = detectCores()-1)
所以我「重置」h2o
,以便我沒有用完內存
我也讀了R H2O - Memory management但我不清楚它是如何工作的。
UPDATE
以下Matteusz評論後,我init
的for loop
外部和for loop
裏面我用h2o.removeAll()
。所以,現在我的代碼看起來像這樣
h2o <- h2o.init(ip = "localhost", port = 54321, nthreads = detectCores()-1)
for(...)
{
read data
gbm.grid <- h2o.grid("gbm", grid_id = "gbm.grid", x = names(td.train.h2o)[!names(td.train.h2o)%like%segment_binary], y = segment_binary,
seed = 42, distribution = "bernoulli",
training_frame = td.train.h2o, validation_frame = td.train.hyper.h2o,
hyper_params = hyper_params, search_criteria = search_criteria)
h2o.removeAll()
rf.grid <- h2o.grid("randomForest", grid_id = "rf.grid", x = names(td.train.h2o)[!names(td.train.h2o)%like%segment_binary], y = segment_binary,
seed = 42, distribution = "bernoulli",
training_frame = td.train.h2o, validation_frame = td.train.hyper.h2o,
hyper_params = hyper_params, search_criteria = search_criteria)
h2o.removeAll() }
看來工作,但現在我得到這個錯誤(?)在grid optimization
爲random forest
任何想法,這可能是?
所以我應該把'init'放在類似'while(h2o.clusterIsUp())'的東西里面? – quant
你應該首先在while循環內運行'h2o.clusterIsUp())'(最好在循環內使用'sleep'),然後在循環之後運行'h2o.init'。但是正如我所提到的那樣是浪費,你不需要每次啓動/停止節點。 –
請參閱更新 – quant