2017-03-20 84 views
3

我在pandas一個DataFrame,如下所示:循環通過先前值在大熊貓乘以

df = pd.DataFrame({'origin_dte':['2009-08-01','2009-08-01','2009-08-01','2009-08-01','2009-09-01','2009-09-01','2009-09-01'], 
        'date':['2009-08-01','2009-08-02','2009-08-03','2009-08-04','2009-09-01','2009-09-02','2009-09-03'], 
        'bal_pred':[10.,11.,12.,13.,21.,22.,23.], 
        'dbal_pred':[np.nan,.25,.3,.5,np.nan,.4,.45]}) 

    bal_pred date dbal_pred origin_dte 
0 10  2009-08-01 NaN  2009-08-01 
1 11  2009-08-02 0.25 2009-08-01 
2 12  2009-08-03 0.30 2009-08-01 
3 13  2009-08-04 0.50 2009-08-01 
4 21  2009-09-01 NaN  2009-09-01 
5 22  2009-09-02 0.40 2009-09-01 
6 23  2009-09-03 0.45 2009-09-01 

欲循環並與dbal_pred[i] * bal_pred[i-1]取代的bal_pred其中dbal_pred != NaN每個觀測。例如,第二個值bal_pred將變爲0.25*10=2.5。當origin_dte發生變化時,含義dbal_pred再次爲NaN,計算將跳過NaN觀察並計算下一個bal_pred。所以df看起來如下所示。我有一個while循環來做這件事,但問題是需要很長時間來循環大數據框。真的很欣賞更簡單/更快的方式來做到這一點!

bal_pred date  dbal_pred origin_dte 
0 10.000 2009-08-01 NaN  2009-08-01 
1 2.500  2009-08-02 0.25 2009-08-01 
2 0.750  2009-08-03 0.30 2009-08-01 
3 0.375  2009-08-04 0.50 2009-08-01 
4 21.000 2009-09-01 NaN  2009-09-01 
5 8.400  2009-09-02 0.40 2009-09-01 
6 3.780  2009-09-03 0.45 2009-09-01 

回答

2
# fillna with 1 so we can cumprod 
c = df.dbal_pred.fillna(1).cumprod() 

# track where null 
n = df.dbal_pred.isnull() 

# take cumprod where null and forward fill 
d = c.where(n).ffill() 

# cumprods divided by cumprod where last null 
# gets us a grouped cumprod that starts over 
# at every null. 
# multiply this by `bal_pred` where null forward filled 
# and voila 
df.assign(bal_pred=c.div(d) * df.bal_pred.where(n).ffill()) 

    bal_pred  date dbal_pred origin_dte 
0 10.000 2009-08-01  NaN 2009-08-01 
1  2.500 2009-08-02  0.25 2009-08-01 
2  0.750 2009-08-03  0.30 2009-08-01 
3  0.375 2009-08-04  0.50 2009-08-01 
4 21.000 2009-09-01  NaN 2009-09-01 
5  8.400 2009-09-02  0.40 2009-09-01 
6  3.780 2009-09-03  0.45 2009-09-01 
3

一種不同的方法將標記每個組數據,然後取各組

group = df['dbal_pred'].isnull().cumsum() 
df.dbal_pred.fillna(df.bal_pred, inplace=True) 
df['bal_pred'] = df.groupby(group)['dbal_pred'].cumprod() 

輸出

bal_pred  date dbal_pred origin_dte 
0 10.000 2009-08-01  NaN 2009-08-01 
1  2.500 2009-08-02  0.25 2009-08-01 
2  0.750 2009-08-03  0.30 2009-08-01 
3  0.375 2009-08-04  0.50 2009-08-01 
4 21.000 2009-09-01  NaN 2009-09-01 
5  8.400 2009-09-02  0.40 2009-09-01 
6  3.780 2009-09-03  0.45 2009-09-01 
+0

感謝和良好的抓累計產品! –