2017-08-24 150 views
1

我有這樣的數據幀:的Python /大熊貓 - 計算比

bal: 

      year id unit period   Revenues Ativo Não-Circulante \ 
business_id                  
9564   2012 302 dsada anual  5964168.52   10976013.70 
9564   2011 303 dsada anual  5774707.15   10867868.13 
2361   2013 304 dsada anual  3652575.31   6608468.52 
2361   2012 305 dsada anual76.15   6027066.03 
2361   2011 306 dsada anual  3858137.49   9733126.02 
2369   2012 307 dsada anual   351373.66   9402830.89 
8104   2012 308 dsada anual  3503226.02   6267307.01 
... 

我想創建一個名爲「成長」欄目。這將是:

(收入從去年和今年/收入) - 1

數據幀應該是這樣的:

   year id unit period   Revenues    Growth \ 
business_id                  
9564   2012 302 dsada anual  5964168.52    0.0328 
9564   2011 303 dsada anual  5774707.15     NaN 
2361   2013 304 dsada anual  3652575.31     10.37 
2361   2012 305 dsada anual76.15     -0.91 
2361   2011 306 dsada anual  3858137.49     NaN 
2369   2012 307 dsada anual   351373.66     NaN 
8104   2012 308 dsada anual  3503226.02     NaN 
... 

我怎麼能這樣做呢?

+1

您需要創建年度增加/減少一個,然後使用新年+/- 1列和ID將收入加入自身,以創建下一個/去年的收入。此後的計算應該是微不足道的。 – n8sty

+0

@ n8sty這個解決方案和你想象的一樣明顯。雖然在這個問題上沒有很好地闡述,但收入的年增長率是以'business_id'爲基礎的。 – Alexander

回答

1

我假設你的數據框被命名爲df。首先休息索引,以便business_id是一列,然後在year上對結果進行排序。現在將數據框分組在business_id上,並將結果轉換爲收入的百分比變化。最後,通過索引來獲取原始訂單。

df2 = df.reset_index().sort_values(['year']) 
df2 = (
    df2 
    .assign(Growth=df2.groupby(['business_id'])['Revenues'].transform(
     lambda group: group.pct_change())) 
    .sort_index() 
) 
>>> df2 
business_id year id unit period Revenues Ativo Não-Circulante Growth 
0 9564 2012 302 dsada anual 5964168.52 10976013.70   0.032809 
1 9564 2011 303 dsada anual 5774707.15 10867868.13    NaN 
2 2361 2013 304 dsada anual 3652575.31 6608468.52   10.376041 
3 2361 2012 305 dsada anual76.15 6027066.03   -0.916779 
4 2361 2011 306 dsada anual 3858137.49 9733126.02     NaN 
5 2369 2012 307 dsada anual 351373.66 9402830.89     NaN 
6 8104 2012 308 dsada anual 3503226.02 6267307.01     NaN 

我覺得你在你的預期輸出有一個錯誤:

5964168.52/5774707.15 - 1 = 0.0328 # vs. 0.16 shown. 
+0

偉大的解決方案。事實上,我在這個問題上錯誤地計算了它。我將編輯並修復它。謝謝@亞歷山大 – abutremutante

0

您需要通過groupby值按年份循環「groupby」year和「sort_values」來計算增長,將增長存儲在列表中並轉換爲numpy.array(增長),並添加到數據框中。

#df is your dataframe 
group = df.groupby(df['year']) 
R = {} #Store Revenue in dictionary 
y = [] #make list of year to append years 
for year, values in group: 
    R[year] = values[Revenues] 
    y.append(year) 
g = [] #create list of growth 
for i, eyear in enumerate(y): 
    try: 
     g.append(eyear[i]/eyear[i+1]) 
    except: 
     pass 
df['Growth'] = numpy.array(g) #Create numpy array and append to df 
0

看起來像你需要一個groupby('business_id'),然後shift拿到去年的收入。保存關閉新的一列,然後做比,就像這樣:

df.reset_index(inplace=True) # You might have to do this because it looks like your index is 'business_id' 

df['Previous Revenues'] = df.sort_values('year').groupby('business_id')['Revenues'].shift(1) 
df['Growth'] = df['Revenues']/df['Previous Revenues'] - 1 

如果你想,你並不需要保存新列,但該行變得有點凌亂:

df['Growth'] = df['Revenues']/df.sort_values('year').groupby('business_id')['Revenues'].shift(1) - 1