2016-04-27 102 views
1

我有一個2000行的數據框,我試圖將同一個數據框分成兩部分並將它們組合在一起。H2O python rbind error

t1 = test[:10, :] 
t2 = test[20:, :] 
temp = t1.rbind(t2) 
temp.show() 

然後我得到這個錯誤:

--------------------------------------------------------------------------- 
EnvironmentError       Traceback (most recent call last) 
<ipython-input-37-8daeb3375743> in <module>() 
     2 t2 = test[20:, :] 
     3 temp = t1.rbind(t2) 
----> 4 temp.show() 
     5 print len(temp) 
     6 print len(test) 

/usr/local/lib/python2.7/dist-packages/h2o/frame.pyc in show(self, use_pandas) 
    383  print("This H2OFrame has been removed.") 
    384  return 
--> 385  if not self._ex._cache.is_valid(): self._frame()._ex._cache.fill() 
    386  if H2ODisplay._in_ipy(): 
    387  import IPython.display 

/usr/local/lib/python2.7/dist-packages/h2o/frame.pyc in _frame(self, fill_cache) 
    423 
    424 def _frame(self, fill_cache=False): 
--> 425  self._ex._eager_frame() 
    426  if fill_cache: 
    427  self._ex._cache.fill() 

/usr/local/lib/python2.7/dist-packages/h2o/expr.pyc in _eager_frame(self) 
    67  if not self._cache.is_empty(): return self 
    68  if self._cache._id is not None: return self # Data already computed under ID, but not cached locally 
---> 69  return self._eval_driver(True) 
    70 
    71 def _eager_scalar(self): # returns a scalar (or a list of scalars) 

/usr/local/lib/python2.7/dist-packages/h2o/expr.pyc in _eval_driver(self, top) 
    81 def _eval_driver(self, top): 
    82  exec_str = self._do_it(top) 
---> 83  res = ExprNode.rapids(exec_str) 
    84  if 'scalar' in res: 
    85  if isinstance(res['scalar'], list): self._cache._data = [float(x) for x in res['scalar']] 

/usr/local/lib/python2.7/dist-packages/h2o/expr.pyc in rapids(expr) 
    163  The JSON response (as a python dictionary) of the Rapids execution 
    164  """ 
--> 165  return H2OConnection.post_json("Rapids", ast=expr,session_id=H2OConnection.session_id(), _rest_version=99) 
    166 
    167 class ASTId: 

/usr/local/lib/python2.7/dist-packages/h2o/connection.pyc in post_json(url_suffix, file_upload_info, **kwargs) 
    515  if __H2OCONN__ is None: 
    516  raise ValueError("No h2o connection. Did you run `h2o.init()` ?") 
--> 517  return __H2OCONN__._rest_json(url_suffix, "POST", file_upload_info, **kwargs) 
    518 
    519 def _rest_json(self, url_suffix, method, file_upload_info, **kwargs): 

/usr/local/lib/python2.7/dist-packages/h2o/connection.pyc in _rest_json(self, url_suffix, method, file_upload_info, **kwargs) 
    518 
    519 def _rest_json(self, url_suffix, method, file_upload_info, **kwargs): 
--> 520  raw_txt = self._do_raw_rest(url_suffix, method, file_upload_info, **kwargs) 
    521  return self._process_tables(raw_txt.json()) 
    522 

/usr/local/lib/python2.7/dist-packages/h2o/connection.pyc in _do_raw_rest(self, url_suffix, method, file_upload_info, **kwargs) 
    592  raise EnvironmentError(("h2o-py got an unexpected HTTP status code:\n {} {} (method = {}; url = {}). \n"+ \ 
    593        "detailed error messages: {}") 
--> 594        .format(http_result.status_code,http_result.reason,method,url,detailed_error_msgs)) 
    595 
    596 

EnvironmentError: h2o-py got an unexpected HTTP status code: 
500 Server Error (method = POST; url = http://localhost:54321/99/Rapids). 
detailed error messages: [] 

如果我計算行(LEN(TEMP)),它的工作原理找到。另外,如果我稍微改變切片索引,它也可以找到。例如,如果我更改爲此,它會顯示數據框。

t1 = test[:10, :] 
t2 = test[:5, :] 

我在這裏想念什麼嗎?謝謝。

回答

0

不清楚發生了什麼,沒有更多的信息(日誌可能會說,爲什麼沒有采取)。

您使用的是什麼版本?我在虹膜的邊緣嘗試了你的代碼,這一切都按預期工作。

順便說一句,rbind通常將是昂貴的,特別是因爲你在語義上後在做什麼是一個子集:

test[range(10) + range(20,test.nrow),:]

也應該給你想要的子集(與有條件的,就是你做python中的行索引的完整列表,並將其通過REST傳遞給h2o)。

+0

嗨,謝謝你的回答,版本是3.8.1.4。它符合你的建議。我最初的想法是實現k-fold功能,我是新手,我想知道你是否知道如何有效地做到這一點。謝謝。 – hamuchiwa

+0

你不需要實現你自己的k-fold函數,H2O已經使用'nfolds'參數進行了交叉驗證。看看這個筆記本的例子:https://github.com/h2oai/h2o-3/blob/master/h2o-py/demos/H2O_tutorial_eeg_eyestate.ipynb –

+0

我明白nfolds的論點。我想要做的是在GLM模型中同時設置lambda_search = True和nfolds = 3。似乎不會讓我這樣做。爲了避免在lambda上進行手動網格搜索,我決定實現k-fold函數。它聽起來像一個正確的方式?非常感謝。 – hamuchiwa