2017-03-03 57 views
1

我有一個DF在那裏我計算領域Assitance減少代碼和週期時間線

account Total Start Date End Date EMI 
211829 107000 05/19/17 01/22/19 5350 
320563 175000 08/04/17 10/30/18 12500 
648336 246000 02/26/17 08/25/19 8482.7586206897 
109996 175000 11/23/17 11/27/19 7291.6666666667 
121213 317000 09/07/17 04/12/18 45285.7142857143 

然後填充EMI值根據日期範圍創建類似1月17日的新領域,2月17日,3月17日等,並填寫下面的代碼。

jant17 = pd.to_datetime('2017-01-01') 
febt17 = pd.to_datetime('2017-02-01') 
mart17 = pd.to_datetime('2017-03-01') 

jan17 = pd.to_datetime('2017-01-31') 
feb17 = pd.to_datetime('2017-02-28') 
mar17 = pd.to_datetime('2017-03-31') 

df.ix[(df['Start Date'] <= jan17) & (df['End Date'] >= jant17) , 'Jan17'] = df['EMI'] 

但缺點是,當我必須做一個預測,直到2019年或2020年,他們成爲了代碼太多行寫,當有任何更新我需要修改的代碼行數太多。爲了減少代碼行,我嘗試了使用for循環的替代方法,但代碼開始花費很長時間才能執行。

monthend = { 'Jan17' : pd.to_datetime('2017-01-31'), 
      'Feb17' : pd.to_datetime('2017-02-28'), 
      'Mar17' : pd.to_datetime('2017-03-31')} 

monthbeg = { 'Jant17' : pd.to_datetime('2017-01-01'), 
      'Febt17' : pd.to_datetime('2017-02-01'), 
      'Mart17' : pd.to_datetime('2017-03-01')} 

for mend in monthend.values(): 
    for mbeg in monthbeg.values(): 
     for coln in colnames: 
      df.ix[(df['Start Date'] <= mend) & (df['End Date'] >= mbeg) , coln] = df['EMI'] 

這極大地減少了代碼行數,但增加了執行時間從3-4分鐘到1小時加。有沒有更好的辦法用更少的線條和較小的處理時間

回答

3

我想你可以創建幫手dfstartend日期和列names,環行實現代碼,並創建的原始df新列:

dates = pd.DataFrame({'start':pd.date_range('2017-01-01', freq='MS', periods=10), 
         'end':pd.date_range('2017-01-01', freq='M', periods=10)}) 
dates['names'] = dates.start.dt.strftime('%b%y') 
print (dates) 
     end  start names 
0 2017-01-31 2017-01-01 Jan17 
1 2017-02-28 2017-02-01 Feb17 
2 2017-03-31 2017-03-01 Mar17 
3 2017-04-30 2017-04-01 Apr17 
4 2017-05-31 2017-05-01 May17 
5 2017-06-30 2017-06-01 Jun17 
6 2017-07-31 2017-07-01 Jul17 
7 2017-08-31 2017-08-01 Aug17 
8 2017-09-30 2017-09-01 Sep17 
9 2017-10-31 2017-10-01 Oct17 

#if necessary convert to datetimes 
df['Start Date'] = pd.to_datetime(df['Start Date']) 
df['End Date'] = pd.to_datetime(df['End Date']) 

def f(x): 
    df.loc[(df['Start Date'] <= x.start) & (df['End Date'] >= x.end) , x.names] = df['EMI'] 
dates.apply(f, axis=1) 
print (df) 
    account Total Start Date End Date   EMI Jan17 Feb17 \ 
0 211829 107000 2017-05-19 2019-01-22 5350.000000 NaN NaN 
1 320563 175000 2017-08-04 2018-10-30 12500.000000 NaN NaN 
2 648336 246000 2017-02-26 2019-08-25 8482.758621 NaN NaN 
3 109996 175000 2017-11-23 2019-11-27 7291.666667 NaN NaN 
4 121213 317000 2017-09-07 2018-04-12 45285.714286 NaN NaN 

     Mar17  Apr17  May17  Jun17  Jul17 \ 
0   NaN   NaN   NaN 5350.000000 5350.000000 
1   NaN   NaN   NaN   NaN   NaN 
2 8482.758621 8482.758621 8482.758621 8482.758621 8482.758621 
3   NaN   NaN   NaN   NaN   NaN 
4   NaN   NaN   NaN   NaN   NaN 

     Aug17   Sep17   Oct17 
0 5350.000000 5350.000000 5350.000000 
1   NaN 12500.000000 12500.000000 
2 8482.758621 8482.758621 8482.758621 
3   NaN   NaN   NaN 
4   NaN   NaN 45285.714286 
+0

非常感謝你的完美工作。你是個天才。 –