2017-07-08 94 views
1

列值,我想新的一列Y添加到每一行,這將告訴我的次數百分比數X列val爲更大> 1爲去年10個以前的記錄計算基於以前的行

stock price history 

    ticker  date adj_open ad_close  X(%) 
0 ABC  2017-10-06 12.10  13.11  8.0 
1 ABC  2017-12-05 11.11  11.87  5.0 
2 ABC  2017-12-04 12.08  11.40  -7.0 
3 ABC  2017-12-03 12.01  13.03  10.1 
4 ABC  2017-07-04 9.01  9.59  8.0 
5 ABC  2017-07-03 7.89  8.19  4.0 

Resultant transformed data set 

    ticker  date adj_open ad_close X(%)  Y(%)  
0 ABC  2017-10-06 12.10 13.11  8.0  80 
1 ABC  2017-12-05 11.11 11.87  5.0  75 
2 ABC  2017-12-04 12.08 11.40  -7.0  100 
3 ABC  2017-12-03 12.01 13.03  10.1  100 
4 ABC  2017-07-04 9.01  9.59  8.0  100 
5 ABC  2017-07-03 7.89  8.19  4.0  0 
+0

這裏的解決方案可能涉及到'DataFrame.rolling' https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.rolling.html – cmaher

回答

0

嘗試這只是一個simple循環與tryexcept這是基礎上的示例輸出,嘗試修改的基礎上你的data

n=5 #your example 
df['boolean']=df['X(%)']>1 
A=[] 
for i in range(len(df)): 
    try : 
     A.append(sum(df.boolean[i+1:i+n+1])/len(df.boolean[i+1:i+n+1])) 
    except: 
     A.append(0) 

df['Y(%)']=A 


df 

    ticker  date adj_open ad_close X(%) boolean Y(%) 
    0 ABC 10/6/2017  12.10  13.11 8.0 True 0.80 
    1 ABC 12/5/2017  11.11  11.87 5.0 True 0.75 
    2 ABC 12/4/2017  12.08  11.40 -7.0 False 1.00 
    3 ABC 12/3/2017  12.01  13.03 10.1 True 1.00 
    4 ABC 7/4/2017  9.01  9.59 8.0 True 1.00 
    5 ABC 7/3/2017  7.89  8.19 4.0 True 0.00 
0

您有:

df 
    ticker date  adj_open ad_close X(%) 
0 ABC  2017-10-06 12.10  13.11  8.0 
1 ABC  2017-12-05 11.11  11.87  5.0 
2 ABC  2017-12-04 12.08  11.40  -7.0 
3 ABC  2017-12-03 12.01  13.03  10.1 
4 ABC  2017-07-04 9.01  9.59  8.0 
5 ABC  2017-07-03 7.89  8.19  4.0 

讓我們定義window和,將計算所需量的函數:

w = 2 
def count_pcnt(x, window = w): 
    return (np.sum(x>1)/window)*100.0 

最後,讓我們應用功能:

df["Y(%)"] = df["X(%)"].rolling(window=w).apply(count_pcnt) 
df 

    ticker date  adj_open ad_close X(%) Y(%) 
0 ABC  2017-10-06 12.10  13.11  8.0  NaN 
1 ABC  2017-12-05 11.11  11.87  5.0  100.0 
2 ABC  2017-12-04 12.08  11.40  -7.0  50.0 
3 ABC  2017-12-03 12.01  13.03  10.1  50.0 
4 ABC  2017-07-04 9.01   9.59  8.0  100.0 
5 ABC  2017-07-03 7.89   8.19  4.0  100.0 

您可以更改w10,你有更多數據。

編輯

如果你想:

w=4 
df["Y(%)"] = df["X(%)"].rolling(window=w).apply(lambda x: count_pcnt(x, window = w)) 

df 
    ticker date  adj_open ad_close X(%) Y(%) 
0 ABC  2017-10-06 12.10  13.11  8.0  NaN 
1 ABC  2017-12-05 11.11  11.87  5.0  NaN 
2 ABC  2017-12-04 12.08  11.40  -7.0  NaN 
3 ABC  2017-12-03 12.01  13.03  10.1  75.0 
4 ABC  2017-07-04 9.01  9.59  8.0  75.0 
5 ABC  2017-07-03 7.89  8.19  4.0  75.0 

EDIT 2

w=4 # specify the desired window 
df["Y(%)"] = df["X(%)"].rolling(window=w).apply(lambda x: (np.sum(x>1)/x.shape[0])* 100.0) 

編輯3

w=4 
df["Y(%)"] = df["X(%)"].rolling(window=w 
           ,min_periods = 0).apply(lambda x: (np.sum(x>1)/x.shape[0])* 100.0) 
df 

ticker date adj_open ad_close X(%) Y(%) 
0 ABC 2017-10-06 12.10 13.11 8.0  100.000000 
1 ABC 2017-12-05 11.11 11.87 5.0  100.000000 
2 ABC 2017-12-04 12.08 11.40 -7.0 66.666667 
3 ABC 2017-12-03 12.01 13.03 10.1 75.000000 
4 ABC 2017-07-04 9.01 9.59 8.0  75.000000 
5 ABC 2017-07-03 7.89 8.19 4.0  75.000000 
+0

得到一個錯誤count_pcnt()只需要1個參數(0給出) – user845405

+0

@ user845405在這裏完美的工作。我建議你仔細檢查你是否正確複製了代碼,包括列名........ –

+0

我用編輯2,現在得到所有的NaN值?即使你的一些結果有NaN – user845405