2013-08-27 275 views
0

我想比較多個模型的輸出運行,計算這些值:計算差異

當期的收入和上期
  • 實際當期的差異之間
    1. 差異收入和預測的當期收入

    我嘗試過使用多指標,並且懷疑答案在於某個創意轉換()的方向。但是,恐怕我已經通過偶然的應用各種支點/熔化/團體實驗來改變問題。也許你可以幫我找出如何把這:

    import pandas as pd 
    
    ids = [1,2,3] * 5 
    year = ['2013', '2013', '2013', '2014', '2014', '2014', '2014', '2014', '2014', '2015', '2015', '2015', '2015', '2015', '2015'] 
    run = ['actual','actual','actual','forecast','forecast','forecast','actual','actual','actual','forecast','forecast','forecast','actual','actual','actual'] 
    
    revenue = [10,20,20,30,50,90,10,40,50,120,210,150,130,100,190] 
    
    change_from_previous_year = ['NA','NA','NA',20,30,70,0,20,30,90,160,60,120,60,140] 
    change_from_forecast = ['NA','NA','NA','NA','NA','NA',-20,-10,-40,'NA','NA','NA',30,-110,40] 
    
    d = {'ids':ids, 'year':year, 'run':run, 'revenue':revenue} 
    
    df = pd.DataFrame(data=d, columns=['ids','year','run','revenue']) 
    print df 
    
        ids year  run revenue 
    0  1 2013 actual  10 
    1  2 2013 actual  20 
    2  3 2013 actual  20 
    3  1 2014 forecast  30 
    4  2 2014 forecast  50 
    5  3 2014 forecast  90 
    6  1 2014 actual  10 
    7  2 2014 actual  40 
    8  3 2014 actual  50 
    9  1 2015 forecast  120 
    10 2 2015 forecast  210 
    11 3 2015 forecast  150 
    12 1 2015 actual  130 
    13 2 2015 actual  100 
    14 3 2015 actual  190 
    

    ....這個:

    ids year  run revenue chg_from_prev_year chg_from_forecast 
    0  1 2013 actual  10     NA    NA 
    1  2 2013 actual  20     NA    NA 
    2  3 2013 actual  20     NA    NA 
    3  1 2014 forecast  30     20    NA 
    4  2 2014 forecast  50     30    NA 
    5  3 2014 forecast  90     70    NA 
    6  1 2014 actual  10     0    -20 
    7  2 2014 actual  40     20    -10 
    8  3 2014 actual  50     30    -40 
    9  1 2015 forecast  120     90    NA 
    10 2 2015 forecast  210    160    NA 
    11 3 2015 forecast  150     60    NA 
    12 1 2015 actual  130    120    30 
    13 2 2015 actual  100     60    -110 
    14 3 2015 actual  190    140    40 
    

    EDIT--我得到相當接近這個:

    df['prev_year'] = df.groupby(['ids','run']).shift(1)['revenue'] 
    df['chg_from_prev_year'] = df['revenue'] - df['prev_year'] 
    
    df['curr_forecast'] = df.groupby(['ids','year']).shift(1)['revenue'] 
    df['chg_from_forecast'] = df['revenue'] - df['curr_forecast'] 
    

    錯過的唯一一件事(如預期)是2013年預測的2013年預測的實際比較。我可以複製數據集中的2013年運行,計算2014年預測的chg_from_prev_year,並從最終數據框中隱藏/刪除不需要的數據。

  • 回答

    1

    首先擺脫前一年的變化,做到在每個組的變化:

    In [11]: g = df.groupby(['ids', 'run']) 
    
    In [12]: df['chg_from_prev_year'] = g['revenue'].apply(lambda x: x - x.shift()) 
    

    接下來的部分是比較複雜的,我想你需要爲下一個部分做了pivot_table

    In [13]: df1 = df.pivot_table('revenue', ['ids', 'year'], 'run') 
    
    In [14]: df1 
    Out[14]: 
    run  actual forecast 
    ids year 
    1 2013  10  NaN 
        2014  10  30 
        2015  130  120 
    2 2013  20  NaN 
        2014  40  50 
        2015  100  210 
    3 2013  20  NaN 
        2014  50  90 
        2015  190  150 
    
    In [15]: g1 = df1.groupby(level='ids', as_index=False) 
    
    In [16]: out_by = g1.apply(lambda x: x['actual'] - x['forecast']) 
    
    In [17]: out_by # hello levels bug, fixed in 0.13/master... yesterday :) 
    Out[17]: 
    ids ids year 
    1 1 2013 NaN 
          2014 -20 
          2015  10 
    2 2 2013 NaN 
          2014 -10 
          2015 -110 
    3 3 2013 NaN 
          2014 -40 
          2015  40 
    dtype: float64 
    

    這是你想要的結果,但不是正確的格式(如果你沒有太緊張的話,見下面的[31])......下面的內容似乎有點破解溫和地),但這裏去:

    In [21]: df2 = df.set_index(['ids', 'year', 'run']) 
    
    In [22]: out_by.index = out_by.index.droplevel(0) 
    
    In [23]: out_by_df = pd.DataFrame(out_by, columns=['revenue']) 
    
    In [24]: out_by_df['run'] = 'forecast' 
    
    In [25]: df2['chg_from_forecast'] = out_by_df.set_index('run', append=True)['revenue'] 
    

    ,我們就大功告成了......

    In [26]: df2.reset_index() 
    Out[26]: 
        ids year  run revenue chg_from_prev_year chg_from_forecast 
    0  1 2013 actual  10     NaN    NaN 
    1  2 2013 actual  20     NaN    NaN 
    2  3 2013 actual  20     NaN    NaN 
    3  1 2014 forecast  30     NaN    -20 
    4  2 2014 forecast  50     NaN    -10 
    5  3 2014 forecast  90     NaN    -40 
    6  1 2014 actual  10     0    NaN 
    7  2 2014 actual  40     20    NaN 
    8  3 2014 actual  50     30    NaN 
    9  1 2015 forecast  120     90     10 
    10 2 2015 forecast  210     160    -110 
    11 3 2015 forecast  150     60     40 
    12 1 2015 actual  130     120    NaN 
    13 2 2015 actual  100     60    NaN 
    14 3 2015 actual  190     140    NaN 
    

    注:我認爲chg_from_prev_year第6個結果應爲NaN。

    不過,我想你可能會更好保持它作爲一個支點:

    In [31]: df3 = df.pivot_table(['revenue', 'chg_from_prev_year'], ['ids', 'year'], 'run') 
    
    In [32]: df3['chg_from_forecast'] = g1.apply(lambda x: x['actual'] - x['forecast']).values 
    
    In [33]: df3 
    Out[33]: 
          revenue   chg_from_prev_year   chg_from_forecast 
    run  actual forecast    actual forecast 
    ids year 
    1 2013  10  NaN     NaN  NaN    NaN 
        2014  10  30     0  NaN    -20 
        2015  130  120     120  90     10 
    2 2013  20  NaN     NaN  NaN    NaN 
        2014  40  50     20  NaN    -10 
        2015  100  210     60  160    -110 
    3 2013  20  NaN     NaN  NaN    NaN 
        2014  50  90     30  NaN    -40 
        2015  190  150     140  60     40 
    
    +0

    對於一瞬間看完後'見下面的[31]'我想,「哇,安迪會有點爲了回答這個腳註而過度。「 – TomAugspurger

    +0

    @TomAugspurger只是有點過度...... :)(我記得以爲這是一個奇怪的句子!) –