2017-03-18 66 views
1

在我的數據框中,我想創建一個'5D_Peak'列作爲滾動最大值,然後是滾動計數的歷史數據接近峯值的另一列。我想知道是否有簡單的方法來簡單地或理想地引導計算。熊貓:滾動計數,如果在一個循環內

這是我在一個普通的,但複雜的方式代碼:

import numpy as np 
import pandas as pd 

df = pd.DataFrame([[1,2,4],[4,5,2],[3,5,8],[1,8,6],[5,2,8],[1,4,10],[3,5,9],[1,4,7],[1,4,6]], columns=list('ABC')) 

df['5D_Peak']=df['C'].rolling(window=5,center=False).max() 

for i in range(5,len(df.A)): 
    val=0 
    for j in range(i-5,i): 
     if df.loc[j,'C']>df.loc[i,'5D_Peak']-2 and df.loc[j,'C']<df.loc[i,'5D_Peak']+2: 
      val+=1 
    df.loc[i,'5D_Close_to_Peak_Count']=val 

這是我想要的輸出:

A B C 5D_Peak 5D_Close_to_Peak_Count 
0 1 2 4  NaN      NaN 
1 4 5 2  NaN      NaN 
2 3 5 8  NaN      NaN 
3 1 8 6  NaN      NaN 
4 5 2 8  8.0      NaN 
5 1 4 10  10.0      0.0 
6 3 5 9  10.0      1.0 
7 1 4 7  10.0      2.0 
8 1 4 6  10.0      2.0 

回答

1

我相信這是你想要的。您可以設置以下兩個值:

'''the window within which to search "close-to_peak" values''' 
lkp_rng = 5 

'''how close is close?''' 
closeness_measure = 2 

'''function to count the number of "close-to_peak" values in the lkp_rng''' 
fc = lambda x: np.count_nonzero(np.where(x >= x.max()- closeness_measure)) 

'''apply fc to the coulmn you choose''' 
df['5D_Close_to_Peak_Count'] = df['C'].rolling(window=lkp_range,center=False).apply(fc) 
df.head(10) 
     A B C 5D_Peak  5D_Close_to_Peak_Count 
    0 1 2 4 NaN   NaN 
    1 4 5 2 NaN   NaN 
    2 3 5 8 NaN   NaN 
    3 1 8 6 NaN   NaN 
    4 5 2 8 8.0   3.0 
    5 1 4 10 10.0   3.0 
    6 3 5 9 10.0   3.0 
    7 1 4 7 10.0   3.0 
    8 1 4 6 10.0   2.0 

我猜你的意思是「歷史數據」。

+0

謝謝。這也解決了我的問題。但我想JohnE建議的矢量化方法更快? – thunderlion

+0

如果您使用的是ipython notebook,只需在運行代碼的單元格頂部插入'%% prun'即可。它會給出一個長長的清單,並在最上面有一句話總結。我在0.003秒內爲我的''826函數調用(820個原始調用)。如果您插入'%% timeit'並運行該單元格,則會給出「」1000個循環,每個循環最好爲3:745μs「。你也可以檢查其他代碼。' – user2738815

+0

@JohnE你沒有測試你的代碼的速度嗎? – user2738815