2013-01-13 118 views
0

我試圖複製一個例子了韋斯·麥金尼的書對大熊貓的代碼是在這裏(它假定名稱的文件夾下的所有名稱的數據文件都)熊貓集團示例錯誤

# -*- coding: utf-8 -*- 
import numpy as np 
import pandas as pd 

years = range(1880, 2011) 
pieces = [] 
columns = ['name', 'sex', 'births'] 
for year in years: 
    path = 'names/yob%d.txt' % year 
    frame = pd.read_csv(path, names=columns) 
    frame['year'] = year 
    pieces.append(frame) 

names = pd.concat(pieces, ignore_index=True) 
names 

def get_tops(group):  
    return group.sort_index(by='births', ascending=False)[:1000] 

grouped = names.groupby(['year','sex']) 
grouped.apply(get_tops) 

我使用熊貓0.10 Python 2.7。我看到的錯誤是這樣的:

Traceback (most recent call last): 
    File "names.py", line 21, in <module> 
    grouped.apply(get_tops) 
    File "/usr/local/lib/python2.7/dist-packages/pandas-0.10.0-py2.7-linux-i686.egg/pandas/core/groupby.py", line 321, in apply 
    return self._python_apply_general(f) 
    File "/usr/local/lib/python2.7/dist-packages/pandas-0.10.0-py2.7-linux-i686.egg/pandas/core/groupby.py", line 324, in _python_apply_general 
    keys, values, mutated = self.grouper.apply(f, self.obj, self.axis) 
    File "/usr/local/lib/python2.7/dist-packages/pandas-0.10.0-py2.7-linux-i686.egg/pandas/core/groupby.py", line 585, in apply 
    values, mutated = splitter.fast_apply(f, group_keys) 
    File "/usr/local/lib/python2.7/dist-packages/pandas-0.10.0-py2.7-linux-i686.egg/pandas/core/groupby.py", line 2127, in fast_apply 
    results, mutated = lib.apply_frame_axis0(sdata, f, names, starts, ends) 
    File "reduce.pyx", line 421, in pandas.lib.apply_frame_axis0 (pandas/lib.c:24934) 
    File "/usr/local/lib/python2.7/dist-packages/pandas-0.10.0-py2.7-linux-i686.egg/pandas/core/frame.py", line 2028, in __setattr__ 
    self[name] = value 
    File "/usr/local/lib/python2.7/dist-packages/pandas-0.10.0-py2.7-linux-i686.egg/pandas/core/frame.py", line 2043, in __setitem__ 
    self._set_item(key, value) 
    File "/usr/local/lib/python2.7/dist-packages/pandas-0.10.0-py2.7-linux-i686.egg/pandas/core/frame.py", line 2078, in _set_item 
    value = self._sanitize_column(key, value) 
    File "/usr/local/lib/python2.7/dist-packages/pandas-0.10.0-py2.7-linux-i686.egg/pandas/core/frame.py", line 2112, in _sanitize_column 
    raise AssertionError('Length of values does not match ' 
AssertionError: Length of values does not match length of index 

任何想法?

+0

對此我很抱歉:我對自己介紹0.10的這個bug非常惱火,它在git repo中得到了修復,我將在熊貓發佈過程中添加「測試所有書本代碼」。 –

回答

2

我認爲這是0.10中引入的一個錯誤,即issue #2605, 「在GroupBy之後使用apply時發生AssertionError」。它從那以後就被修復了。

您可以等待0.10.1版本,這應該不會太久從現在開始,您也可以升級到開發版本(無論是通過git或只需通過下載大師的zip

+0

0.10有什麼解決方法嗎?在某些情況下,我可以在'groupby'之後'申請'工作,而在其他情況下則不能。 – smci

+0

事實上,這仍然發生在0.10.1 - 我使用0.10.1。但是這個問題被標記爲封閉。奇怪的。 – smci

+0

...並固定在0.11 – smci