2014-03-27 110 views
1

我在使用value_counts方法時遇到了帶有熊貓稀疏數據幀的TypeError。我列出了我正在使用的軟件包版本。熊貓稀疏數據幀value_counts不起作用

關於如何使這項工作的任何建議?

在此先感謝。此外,請讓我知道是否需要更多信息。

Python 2.7.6 |Anaconda 1.9.1 (x86_64)| (default, Jan 10 2014, 11:23:15) 
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin 
Type "help", "copyright", "credits" or "license" for more information. 

>>> import pandas 
>>> print pandas.__version__ 
0.13.1 
>>> import numpy 
>>> print numpy.__version__ 
1.8.0 

>>> dense_df = pandas.DataFrame(numpy.zeros((10, 10)) 
           ,columns=['x%d' % ix for ix in range(10)]) 
>>> dense_df['x5'] = [1.0, 0.0, 0.0, 1.0, 2.1, 3.0, 0.0, 0.0, 0.0, 0.0] 
>>> print dense_df['x5'].value_counts() 
0.0 6 
1.0 2 
3.0 1 
2.1 1 
dtype: int64 

>>> sparse_df = dense_df.to_sparse(fill_value=0) # Tried fill_value=0.0 also 
>>> print sparse_df.density 
0.04 

>>> print sparse_df['x5'].value_counts() 
Traceback (most recent call last): 
File "<stdin>", line 1, in <module> 
File "//anaconda/lib/python2.7/site-packages/pandas/core/series.py", line 1156, in  value_counts 
    normalize=normalize, bins=bins) 
File "//anaconda/lib/python2.7/site-packages/pandas/core/algorithms.py", line 231, in value_counts 
    values = com._ensure_object(values) 
    File "generated.pyx", line 112, in pandas.algos.ensure_object (pandas/algos.c:38788) 
    File "generated.pyx", line 117, in pandas.algos.ensure_object (pandas/algos.c:38695) 
    File "//anaconda/lib/python2.7/site-packages/pandas/sparse/array.py", line 377, in astype 
    raise TypeError('Can only support floating point data for now') 
TypeError: Can only support floating point data for now 
+0

您是否在https://github.com/pydata/pandas/issues提出了錯誤? – smci

+1

剛剛做到了。謝謝你的提示。 – bdanalytics

回答

2

這是未實現的ATM,先轉換爲密集型。

In [12]: sparse_df['x5'].to_dense().value_counts() 
Out[12]: 
0.0 6 
1.0 2 
3.0 1 
2.1 1 
dtype: int64 
+0

感謝您的快速響應。我有一個數據幀7毫米行和約。 90K列。所以,爲每個操作轉換爲密集的幀/數組是很單調乏味的,並且消耗了大量的內存。 – bdanalytics

+0

歡迎您做拉取請求以添加此功能;稀少需要一些tlc – Jeff