pandas（pandas.pydata.org）在df.sortlevel（k）上拋出內存錯誤的時間？

我有一個相當大的數據集（2678271,52）和一個消耗機器內存6.5％的5維索引。當我打電話pandas（pandas.pydata.org）在df.sortlevel（k）上拋出內存錯誤的時間？

df.sortlevel(k)

我收到以下錯誤：



MemoryError        Traceback (most recent call last) 
in() 
----> 1 df = df.sortlevel(4) 

/usr/local/lib/python2.7/dist-packages/pandas-0.9.1-py2.7-linux-x86_64.egg/pandas/core/frame.pyc in sortlevel(self, level, axis, ascending) 
    2978    raise Exception('can only sort by level with a hierarchical index') 
    2979 
-> 2980   new_axis, indexer = the_axis.sortlevel(level, ascending=ascending) 
    2981 
    2982   if self._data.is_mixed_dtype(): 

/usr/local/lib/python2.7/dist-packages/pandas-0.9.1-py2.7-linux-x86_64.egg/pandas/core/index.pyc in sortlevel(self, level, ascending) 
    1856   indexer = _indexer_from_factorized((primary,) + tuple(labels), 
    1857           (primshp,) + tuple(shape), 
-> 1858           compress=False) 
    1859   if not ascending: 
    1860    indexer = indexer[::-1] 

/usr/local/lib/python2.7/dist-packages/pandas-0.9.1-py2.7-linux-x86_64.egg/pandas/core/groupby.pyc in _indexer_from_factorized(labels, shape, compress) 
    2124   max_group = np.prod(shape) 
    2125 
-> 2126  indexer, _ = lib.groupsort_indexer(comp_ids.astype(np.int64), max_group) 
    2127 
    2128  return indexer 

/usr/local/lib/python2.7/dist-packages/pandas-0.9.1-py2.7-linux-x86_64.egg/pandas/lib.so in pandas.lib.groupsort_indexer (pandas/src/tseries.c:55052)() 

MemoryError:

有哪些引發此錯誤的硬編碼的條件？或者是否有可能即使數據只使用6.5％的內存（根據htop），操作會消耗剩餘的內存？

來源

2013-01-10 Arthur G

在0.10有很多性能增強。你能夠嘗試使用最新版本的熊貓嗎？ http://pandas.pydata.org/pandas-docs/stable/whatsnew.html – Zelazny7

0.10還有一些東西讓我很難切換。在這種情況下，我必須等待0.10.1。但是在這個問題上是否有具體的變化可以解釋這種行爲？ –

一個'inplace'選項被添加到'sortlevel'中，這可能會減少內存使用量：https：//github.com/pydata/pandas/issues/1873 – Zelazny7

你可以把它移動到GitHub嗎？我需要查看代碼，但是有很多邊緣案例我沒有真正深入地測試 - 「分級」索引。所以這可能是一個合法的錯誤。

編輯：這已在v0.10.1中修復了

來源

2013-01-10 21:27:27

pandas（pandas.pydata.org）在df.sortlevel（k）上拋出內存錯誤的時間？

回答

相關問題