4
我有一個密度非常低(設置爲0.1%的條目)的大型SparseDataFrame(比如20k索引x 10k列)。我試圖訪問特定的行數據框的,但我似乎無法做到這一點。訪問列雖然很好。下面是說明該問題的一個小例子:熊貓 - 訪問SparseDataFrame的行
import numpy as np
import pandas as pd
df = pd.DataFrame(np.arange(15).reshape(5,3), index=list('abcde'))
df.loc['b',1] = np.nan # for good measure...
sparse = df.to_sparse()
sparse[1] # This is OK.
df.loc['b'] # This is also OK.
sparse.loc['b'] # This blows up.
這裏的回溯:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/.../.virtualenvs/exp/lib/python2.7/site-packages/pandas/core/indexing.py", line 1020, in __getitem__
return self._getitem_axis(key, axis=0)
File "/Users/.../.virtualenvs/exp/lib/python2.7/site-packages/pandas/core/indexing.py", line 1145, in _getitem_axis
return self._get_label(key, axis=axis)
File "/Users/.../.virtualenvs/exp/lib/python2.7/site-packages/pandas/core/indexing.py", line 68, in _get_label
return self.obj._xs(label, axis=axis, copy=True)
File "/Users/.../.virtualenvs/exp/lib/python2.7/site-packages/pandas/core/frame.py", line 2149, in xs
new_values, copy = self._data.fast_2d_xs(loc, copy=copy)
File "/Users/.../.virtualenvs/exp/lib/python2.7/site-packages/pandas/core/internals.py", line 2714, in fast_2d_xs
result[i] = blk._try_coerce_result(blk.iget((j, loc)))
File "/Users/.../.virtualenvs/exp/lib/python2.7/site-packages/pandas/core/internals.py", line 275, in iget
return self.values[i]
File "/Users/.../.virtualenvs/exp/lib/python2.7/site-packages/pandas/sparse/array.py", line 286, in __getitem__
data_slice = self.values[key]
IndexError: too many indices
注意,在「正常」的,密集的數據幀反對它工作得很好。然而,由於大尺寸我這是一個重大的不便,我要麼:
- 轉置數據幀(需要年齡)
- 轉換爲密集的數據幀(吃了太多的內存)
我對熊貓比較陌生,所以也許我錯過了一些東西。無論如何,任何幫助表示讚賞!
我不使用或瞭解不夠稀疏DFS的限制,但這個工程:'sparse.loc ['B ':'b']''如'sparse.ix ['b':'b']''一樣,我仍然沒有爲什麼不使用切片失敗 – EdChum
@EdChum有趣的觀察。我看到的區別是,切片返回一個DataFrame而不是一個系列,所以也許問題在於此轉換以某種方式。 – lum
這可能是未實現的:https://groups.google.com/forum/#!topic/pydata/YEdD8UrkV28,實際上它已經是一個請求:https://github.com/pydata/pandas/issues/4400 – EdChum