2015-12-28 208 views
1

我是python熊貓的新手。 任何幫助將非常感激Python熊貓的平均值和加權平均值

這是我的原始數據:

  Feed Close Sector Market_Cap 
Date 
2015-09-18 A 5.60 Property 50  
2015-09-21 A 5.60 Property 20  
2015-09-23 A 5.60 Property 30  
2015-09-18 ABC 0.67 Property 50  
2015-09-21 ABC 0.66 Property 80  
2015-09-18 DA 0.67 Mining 65  
2015-09-21 KK 1.66 Mining 80  

什麼,我想獲得的是:

1創建一個新列的呼叫平均來計算平均市值每個飼料。

2查找加權平均值。

This is what I want: 
     Feed Close Sector Market_Cap Mean Sector_WeightedAvg 
Date 
2015-09-18 A 5.60 Property 50   33.33  33.33/(33.33+65) 
2015-09-21 A 5.60 Property 20   33.33  33.33/(33.33+65) 
2015-09-23 A 5.60 Property 30   33.33  33.33/(33.33+65) 
2015-09-18 ABC 0.67 Property 50   65   65/(33.33+65) 
2015-09-21 ABC 0.66 Property 80   65   65/(33.33+65) 
2015-09-18 DA 0.67 Mining 65   62   62/(62+80) 
2015-09-21 KK 1.66 Mining 80   80   80/(62+80) 

這是我目前的平均代碼,我得到楠:

df3= pd.DataFrame(df3) 
df3['Mean'] = df3.groupby(by=['Sector'])[ Market_Cap].mean() 

     Feed Close Sector Market_Cap Mean 
Date 
2015-09-18 A 5.60 Property 50   NaN  
2015-09-21 A 5.60 Property 20   NaN  
2015-09-23 A 5.60 Property 30   NaN  
2015-09-18 ABC 0.67 Property 50   NaN    

和加權平均代碼:

df2['WeightedAverage'] =df3[ Market_Cap].value /df3['Mean'].value 

我得到了錯誤:

AttributeError: 'Series' object has no attribute 'value'

+1

'這給error' - 什麼錯誤?我們可以得到回溯? – cel

+1

您的數據框中沒有「Value」列,但您可以在代碼中引用它。 –

+0

恩,我已轉貼。它應該是Market_Cap。我仍然得到了同樣的錯誤 – Dusty

回答

1

IIUC您可以使用transformmean

Weighted Average是通過柱的唯一值的總和除以柱MeanMeandf3通過Sector列組。

print df3 
      Feed Close Sector Market_Cap 
Date           
2015-09-18 A 5.60 Property   50 
2015-09-21 A 5.60 Property   20 
2015-09-23 A 5.60 Property   30 
2015-09-18 ABC 0.67 Property   50 
2015-09-21 ABC 0.66 Property   80 
2015-09-18 DA 0.67 Mining   65 
2015-09-21 KK 1.66 Mining   80 

df3['Mean'] = df3.groupby(by=['Feed'])['Market_Cap'].transform('mean') 
df3['WeightedAverage'] = df3['Mean']/df3.groupby(by=['Sector'])[ 'Mean'].transform(lambda x: sum(x.unique())) 
print df3 
      Feed Close Sector Market_Cap  Mean WeightedAverage 
Date                  
2015-09-18 A 5.60 Property   50 33.333333   0.338983 
2015-09-21 A 5.60 Property   20 33.333333   0.338983 
2015-09-23 A 5.60 Property   30 33.333333   0.338983 
2015-09-18 ABC 0.67 Property   50 65.000000   0.661017 
2015-09-21 ABC 0.66 Property   80 65.000000   0.661017 
2015-09-18 DA 0.67 Mining   65 65.000000   0.448276 
2015-09-21 KK 1.66 Mining   80 80.000000   0.551724 
+0

,但不'sum(x.unique())'假設每個均值是一個唯一值?如果不同部門有多個相等的平均值,會怎麼樣? –

+0

這是可能的,但在這個樣本工作我的方法,因爲每個部門沒有重疊'飼料'。列'平均值'取決於'Feed'列。 – jezrael

0

嘗試變換的組合( '和'),平均

In [5]: df 
Out[5]: 
    Close Feed Market_Cap Sector 
0 5.60 A   50 Property 
1 5.60 A   20 Property 
2 5.60 A   30 Property 
3 0.67 ABC   50 Property 
4 0.66 ABC   80 Property 
5 0.67 DA   65 Mining 
6 1.66 KK   80 Mining 

In [6]: g = df.groupby(['Sector', 'Feed']) 

..

In [7]: c = g.Market_Cap.mean() 

In [8]: c 
Out[8]: 
Sector Feed 
Mining DA  65.000000 
      KK  80.000000 
Property A  33.333333 
      ABC  65.000000 
Name: Market_Cap, dtype: float64 

In [9]: d = c.groupby(level=0).transform('sum') 

In [10]: d 
Out[10]: 
Sector Feed 
Mining DA  145.000000 
      KK  145.000000 
Property A  98.333333 
      ABC  98.333333 
dtype: float64 

..

In [11]: df['Mean'] = df.apply(lambda x: c[x.Sector, x.Feed], axis=1) 

In [12]: df['Weighted_Avg'] = df.apply(lambda x: c[x.Sector, x.Feed]/d[x.Sector, x.Feed], axis=1) 

In [13]: df 
Out[13]: 
    Close Feed Market_Cap Sector  Mean Weighted_Avg 
0 5.60 A   50 Property 33.333333  0.338983 
1 5.60 A   20 Property 33.333333  0.338983 
2 5.60 A   30 Property 33.333333  0.338983 
3 0.67 ABC   50 Property 65.000000  0.661017 
4 0.66 ABC   80 Property 65.000000  0.661017 
5 0.67 DA   65 Mining 65.000000  0.448276 
6 1.66 KK   80 Mining 80.000000  0.551724