2017-01-18 49 views
2

這是對我以前的問題 python pandas rolling function with two arguments的擴展。python pandas滾動功能,在一個分組數據框中有兩個參數

如何按組執行相同操作?假設下面的'C'列用於分組。

我很努力:

  1. 集團通過列 'C'
  2. 在每個組,排序由 'A'
  3. Withing每個組,採用滾動操作以兩個參數,像kendalltau ,參數'A'和'B'。

預期的結果將是象下面這樣一個數據幀:

expected result

我一直在努力的「傳遞一個索引」的解決方法,如上面的鏈接的說明,但這樣的複雜性情況超出我:-(技能,這是一個玩具例子,而不是遠離什麼,我有工作,所以爲了簡單起見,我用隨機生成的數據。

rand = np.random.RandomState(1) 
dff = pd.DataFrame({'A' : np.arange(20), 
        'B' : rand.randint(100, 120, 20), 
        'C' : rand.randint(0, 2, 20)}) 

def my_tau_indx(indx): 
    x = dff.iloc[indx, 0] 
    y = dff.iloc[indx, 1] 
    tau = sp.stats.mstats.kendalltau(x, y)[0] 
    return tau 

dff['tau'] = dff.sort_values(['C', 'A']).groupby('C').rolling(window = 5).apply(my_tau_indx, args = ([dff.index.values])) 

每修復我使C reates又一個bug ...

上述問題已被Nickil Maveli解決,它可以與numpy 1.11.0,pandas 0.18.1,scipy 0.17.1和conda 4.1.4一起使用。它會產生一些警告,但是會起作用。


在我的另一臺機器上使用最新&最大numpy的1.12.0,熊貓0.19.2,SciPy的0.18.1,暢達版本3.10.0和BLAS/LAPACK - 它不工作,我得到下面的回溯。這似乎是版本相關的,因爲我升級第一臺機器,它也停止工作...以科學的名義... ;-)

由於尼克爾建議,這是由於1.11和1.12之間的不兼容。降級numpy幫助。由於我在Windows上安裝了BLAS/LAPACK,因此我從http://www.lfd.uci.edu/~gohlke/pythonlibs/安裝了numpy 1.11.3 + mkl。

Traceback (most recent call last): 

File "<ipython-input-4-bbca2c0e986b>", line 16, in <module> 
t = grp.apply(func) 

File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\groupby.py", line 651, in apply 
return self._python_apply_general(f) 

File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\groupby.py", line 655, in _python_apply_general 
self.axis) 

File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\groupby.py", line 1527, in apply 
res = f(group) 

File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\groupby.py", line 647, in f 
return func(g, *args, **kwargs) 

File "<ipython-input-4-bbca2c0e986b>", line 15, in <lambda> 
func = lambda x: pd.Series(pd.rolling_apply(np.arange(len(x)), 5, my_tau_indx), x.index) 

File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\stats\moments.py", line 584, in rolling_apply 
kwargs=kwargs) 

File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\stats\moments.py", line 240, in ensure_compat 
result = getattr(r, name)(*args, **kwds) 

File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\window.py", line 863, in apply 
return super(Rolling, self).apply(func, args=args, kwargs=kwargs) 

File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\window.py", line 621, in apply 
center=False) 

File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\window.py", line 560, in _apply 
result = calc(values) 

File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\window.py", line 555, in calc 
return func(x, window, min_periods=self.min_periods) 

File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\window.py", line 618, in f 
kwargs) 

File "pandas\algos.pyx", line 1831, in pandas.algos.roll_generic (pandas\algos.c:51768) 

File "<ipython-input-4-bbca2c0e986b>", line 8, in my_tau_indx 
x = dff.iloc[indx, 0] 

File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\indexing.py", line 1294, in __getitem__ 
return self._getitem_tuple(key) 

File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\indexing.py", line 1560, in _getitem_tuple 
retval = getattr(retval, self.name)._getitem_axis(key, axis=axis) 

File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\indexing.py", line 1614, in _getitem_axis 
return self._get_loc(key, axis=axis) 

File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\indexing.py", line 96, in _get_loc 
return self.obj._ixs(key, axis=axis) 

File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\frame.py", line 1908, in _ixs 
label = self.index[i] 

File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\indexes\range.py", line 510, in __getitem__ 
return super_getitem(key) 

File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\indexes\base.py", line 1275, in __getitem__ 
result = getitem(key) 

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices 

最後的檢查:

enter image description here

+0

什麼是預期的輸出? –

+0

@Andrew L - 謝謝,我錯誤地認爲這可以推斷出來。我希望現在更清楚。 – rpl

回答

1

的一種方式,以實現將通過各組進行迭代,並在每一個這樣的羣體使用pd.rolling_apply

import scipy.stats as ss 

def my_tau_indx(indx): 
    x = dff.iloc[indx, 0] 
    y = dff.iloc[indx, 1] 
    tau = ss.mstats.kendalltau(x, y)[0] 
    return tau 

grp = dff.sort_values(['A', 'C']).groupby('C', group_keys=False) 
func = lambda x: pd.Series(pd.rolling_apply(np.arange(len(x)), 5, my_tau_indx), x.index) 
t = grp.apply(func) 
dff.reindex(t.index).assign(tau=t) 

enter image description here


編輯:

def my_tau_indx(indx): 
    x = dff.ix[indx, 0] 
    y = dff.ix[indx, 1] 
    tau = ss.mstats.kendalltau(x, y)[0] 
    return tau 

grp = dff.sort_values(['A', 'C']).groupby('C', group_keys=False) 
t = grp.rolling(5).apply(my_tau_indx).get('A') 

grp.head(dff.shape[0]).reindex(t.index).assign(tau=t) 

enter image description here

+1

感謝您發佈解決方案。由於'pd.rolling_apply'將被棄用,我想知道是否有辦法用'rolling'實現相同的功能?另外,(這更多是我的好奇心)是否會提供一種不依賴於修改全局變量的函數的解決方案? – rpl

+1

我不認爲'DF.rolling()。apply()'能夠從當前窗體中的自定義函數返回標量值。另一種方法是使用滑動窗口列表理解重新設計,然後將各種這樣的計算逐行連接起來,這看起來太費勁了。最好是現在堅持使用'pd.rolling_apply()',並等待改進後的版本在未來推出或[**在github上發佈一個解決這個問題的問題**](https://github.com/pandas-dev/pandas/issues) –

+0

你可以用完整的回溯來編輯你的問題,這樣很容易調試嗎?之前它工作正常嗎? –

相關問題