Python的大熊貓：店子系列的數據幀列

我想創建一個DataFrame包含了許多不同Series子類我已經定義。然而，當分配給DataFrame時，似乎該子類從Series中被剝離。Python的大熊貓：店子系列的數據幀列

這裏有一個玩具的例子來說明這個問題：

>>> import pandas as pd 
>>> class SeriesSubclass(pd.Series): 
...  @property 
...  def _constructor(self): 
...   return SeriesSubclass 
...  def times_two(self): 
...  """Method I need in this subclass.""" 
...   return self * 2 
... 
>>> subclass = SeriesSubclass([7, 8, 9]) 
>>> type(subclass)     # fine 
<class '__main__.SeriesSubclass'> 
>>> subclass.times_two()    # fine 
0 14 
1 16 
2 18 
dtype: int64 
>>> 
>>> data = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=list('ABC')) 
>>> data['D'] = subclass 
>>> type(data['D'])     # not good 
<class 'pandas.core.series.Series'> 
>>> data['D'].times_two()   # not good 
Traceback (most recent call last): 
    ... 
AttributeError: 'Series' object has no attribute 'times_two'

我已經看到了這個問題，可能是以前#1713募集，但我不能辨別實際的解決方案。作爲一個如此龐大的圖書館，它很難遵循各種PR，文檔版本等。而且，我所知道的子類化機制似乎並沒有被很好地描述（this seems to be it）。

來源

2016-10-20 Jordan Mackie

我認爲你運氣不好，除非你還定義了自己的pd.DataFrame子類。這將是一項更加艱鉅的任務。

考慮這個例子

df = pd.DataFrame() 
s = pd.Series([1, 2, 3]) 
s.random_attribute = 'hello!' 
print(s.random_attribute) 

df['A'] = s 
print(df.A.random_attribute) 

hello! 
--------------------------------------------------------------------------- 
AttributeError       Traceback (most recent call last) 
<ipython-input-273-e0031d933193> in <module>() 
     5 
     6 df['A'] = s 
----> 7 print(df.A.random_attribute) 

//anaconda/envs/3.5/lib/python3.5/site-packages/pandas/core/generic.py in __getattr__(self, name) 
    2742    if name in self._info_axis: 
    2743     return self[name] 
-> 2744    return object.__getattribute__(self, name) 
    2745 
    2746  def __setattr__(self, name, value): 

AttributeError: 'Series' object has no attribute 'random_attribute'

df.A不s。 df.A從s構建而來，並忽略它是什麼類型。

來源

2016-10-21 04:17:05 piRSquared

對於具有類似需要的人的好處：我認爲最好的辦法是定義的DataFrame一個子類，並與__getitem__邏輯干預。

我原來的問題是基於這樣的假設：DataFrame是作爲一個容器來實現的，而它基本上不是。這是更加動感，e.g ...

>>> from pandas import Series, DataFrame 
>>> s = Series([1, 2, 3, 4], name='x') 
>>> df = DataFrame(s) 
>>> s is df.x 
False

因此，爲了獲取作爲Series子類欄目，您需要與__getitem__鼓搗。

我在自己的包中實現這一點，這或許可以作爲一個例子：https://github.com/jmackie4/activityio/blob/master/activityio/_util/types.py

我很熱衷於從任何一個有更優雅的解決方案聽到，但！

來源

2016-11-02 20:56:07

Python的大熊貓：店子系列的數據幀列

回答

相關問題