2016-09-15 63 views
1

我有數據幀:創建摘要行

df = pd.DataFrame({'State': {0: "AZ", 1: "AZ", 2:"AZ", 4: "AZ", 5: "AK", 6: "AK", 7 : "AK", 8: "AK"}, 
       'City': {0: "A", 1: "A", 2:"B", 4: "B", 5: "C", 6: "C", 7 : "D", 8: "D"}, 
       'Area': {0: "North", 1: "South", 2:"North", 4: "South", 5: "North", 6: "South", 7 : "North", 8: "South"}, 
       'Restaurant': {0: "Rest1", 1: "Rest2", 2:"Rest3", 4: "Rest4", 5: "Rest5", 6: "Rest6", 7 : "Rest7", 8: "Rest8"}, 
       'Price': {0: 2343, 1: 23445, 2:34536, 4: 7456, 5: 6584, 6: 64563, 7 : 54745, 8: 436345}}, 
       columns=['State','City','Area','Restaurant','Price']) 

print(df) 
State City Area Restaurant Price 
    0 AZ A North  Rest1 2343 
    1 AZ A South  Rest2 23445 
    2 AZ B North  Rest3 34536 
    ... 

我也有以下透視表:

pivo=pd.pivot_table(df,values=["Price"], 
       columns=['State',"City", 'Area'], 
       margins=True, 
       aggfunc=[len, np.mean]) 
print(pivo) 
         len  mean 
    State City Area     
Price AK C North 1 6584.000 
       South 1 64563.000 
      D North 1 54745.000 
       South 1 436345.000 
     AZ A North 1 2343.000 
       South 1 23445.000 
      B North 1 34536.000 
       South 1 7456.000 
     All     8 78752.125 

我希望能夠計算出一個「全」一行彙總各州和每個城市,所以它看起來像這樣:

     len  mean 
    State City Area     
Price AK All   4  281118.5 
      C All  2  35573.5 
       North 1 6584.000 
       South 1 64563.000 
      D All  2  245545 
       North 1 54745.000 
       South 1 436345.000 
     ... 

我一直在玩疊堆/堆棧,但我還沒有生產任何接近。

謝謝!

編輯:這是我已經得到最接近:

pivo=pd.pivot_table(df,values=["Price"], 
       index=['State'], 
       columns=["City", 'Area'], 
       margins=True, 
       aggfunc=[len, np.mean]) 

        len  mean 
       Price  Price 
State City Area     
AK All   4.0 140559.000 
     C North 1.0 6584.000 
      South 1.0 64563.000 
     D North 1.0 54745.000 
      South 1.0 436345.000 
AZ A North 1.0 2343.000 
      South 1.0 23445.000 
     All   4.0 16945.000 
     B North 1.0 34536.000 
      South 1.0 7456.000 
All A North 1.0 2343.000 
      South 1.0 23445.000 
     All   8.0 78752.125 
     B North 1.0 34536.000 
      South 1.0 7456.000 
     C North 1.0 6584.000 
      South 1.0 64563.000 
     D North 1.0 54745.000 
      South 1.0 436345.000 

回答

1

編輯:錯過了,你想在那裏狀態的利潤率,這一事實。我留下原來的答案以防萬一 - 它可能仍然有用。向下滾動一些hacky熊貓。


這有幫助嗎?

In [1]: import pandas as pd 

In [2]: import numpy as np 

In [3]: df = pd.DataFrame({'State': {0: "AZ", 1: "AZ", 2:"AZ", 4: "AZ", 5: "AK", 6: "AK", 7 : "AK", 8: "AK"}, 
    ...: 
    ...:    'City': {0: "A", 1: "A", 2:"B", 4: "B", 5: "C", 6: "C", 7 : "D", 8: "D"}, 
    ...:    'Area': {0: "North", 1: "South", 2:"North", 4: "South", 5: "North", 6: "South", 7 : "No 
    ...: rth", 8: "South"}, 
    ...:    'Restaurant': {0: "Rest1", 1: "Rest2", 2:"Rest3", 4: "Rest4", 5: "Rest5", 6: "Rest6", 7 
    ...: : "Rest7", 8: "Rest8"}, 
    ...:    'Price': {0: 2343, 1: 23445, 2:34536, 4: 7456, 5: 6584, 6: 64563, 7 : 54745, 8: 436345} 
    ...: }, 
    ...:    columns=['State','City','Area','Restaurant','Price']) 

In [4]: pv = (df.pivot_table(index=['State', 'City'], 
    ...:     columns=['Area'], 
    ...:     values=['Price'], 
    ...:     margins=True, 
    ...:     aggfunc=[len, np.mean])) 

In [5]: pv 
Out[5]: 
      len    mean 
      Price    Price 
Area  North South All North  South   All 
State City 
AK C  1.0 1.0 2.0 6584.0 64563.0 35573.500 
     D  1.0 1.0 2.0 54745.0 436345.0 245545.000 
AZ A  1.0 1.0 2.0 2343.0 23445.0 12894.000 
     B  1.0 1.0 2.0 34536.0 7456.0 20996.000 
All   4.0 4.0 8.0 24552.0 132952.0 78752.125 

In [6]: pv.stack() 
Out[6]: 
        len  mean 
       Price  Price 
State City Area 
AK C All  2.0 35573.500 
      North 1.0 6584.000 
      South 1.0 64563.000 
     D All  2.0 245545.000 
      North 1.0 54745.000 
      South 1.0 436345.000 
AZ A All  2.0 12894.000 
      North 1.0 2343.000 
      South 1.0 23445.000 
     B All  2.0 20996.000 
      North 1.0 34536.000 
      South 1.0 7456.000 
All  All  8.0 78752.125 
      North 4.0 24552.000 
      South 4.0 132952.000 

作爲一個班輪:

In [7]: pv = (df.pivot_table(index=['State', 'City'], 
    ...:     columns=['Area'], 
    ...:     values=['Price'], 
    ...:     margins=True, 
    ...:     aggfunc=[len, np.mean]) 
    ...:  .stack()) 

In [8]: pv 
Out[8]: 
        len  mean 
       Price  Price 
State City Area 
AK C All  2.0 35573.500 
      North 1.0 6584.000 
      South 1.0 64563.000 
     D All  2.0 245545.000 
      North 1.0 54745.000 
      South 1.0 436345.000 
AZ A All  2.0 12894.000 
      North 1.0 2343.000 
      South 1.0 23445.000 
     B All  2.0 20996.000 
      North 1.0 34536.000 
      South 1.0 7456.000 
All  All  8.0 78752.125 
      North 4.0 24552.000 
      South 4.0 132952.000 

添加在該州的利潤率是有點煩瑣的,它不是在所有的優雅。我很想看到這方面的改進。


In [9]: pv = (df.pivot_table(index=['State', 'City'], 
    ...:     columns=['Area'], 
    ...:     values=['Price'], 
    ...:     margins=True, 
    ...:     aggfunc=[len, np.mean])) 

In [10]: state_agg = (df[['Price', 'State']] 
    ...:    .pivot_table(index='State', aggfunc=[len, np.mean], margins=True) 
    ...:    .assign(City= 'state_margin').assign(Area="") 
    ...:    ) 
    ...: state_agg.loc['All', 'City'] = 'total' 
    ...: 

In [11]: state_agg 
Out[11]: 
     len  mean   City Area 
     Price  Price 
State 
AK  4.0 140559.000 state_margin 
AZ  4.0 16945.000 state_margin 
All  8.0 78752.125   total 

以下iloc[0:-1]滴在第一樞轉表邊緣行。

In [12]: results = (pd.concat([pv.iloc[0:-1].stack().reset_index(), 
    ...:   state_agg.reset_index() 
    ...:   ]) 
    ...: ).set_index(['State', 'City', 'Area']).sort_index() 

In [13]: results 
Out[13]: 
          len  mean 
         Price  Price 
State City   Area 
AK C   All  2.0 35573.500 
        North 1.0 6584.000 
        South 1.0 64563.000 
     D   All  2.0 245545.000 
        North 1.0 54745.000 
        South 1.0 436345.000 
     state_margin   4.0 140559.000 
AZ A   All  2.0 12894.000 
        North 1.0 2343.000 
        South 1.0 23445.000 
     B   All  2.0 20996.000 
        North 1.0 34536.000 
        South 1.0 7456.000 
     state_margin   4.0 16945.000 
All total    8.0 78752.125 

In [14]: idx = pd.IndexSlice 
    ...: results.loc[idx[:, 'state_margin'], :] 
    ...: 
Out[14]: 
          len  mean 
         Price  Price 
State City   Area 
AK state_margin  4.0 140559.0 
AZ state_margin  4.0 16945.0