2016-04-14 242 views
0

取均值由一列,然後由另一個我有以下數據集:在大熊貓

data = {'VALVE_SCORE': {0: 34.1,1: 41.0,2: 49.7,3: 53.8,4: 35.8,5: 49.2,6: 38.6,7: 51.2,8: 44.8,9: 51.5,10: 41.9,11: 46.0,12: 41.9,13: 51.4,14: 35.0,15: 49.7,16: 41.5,17: 51.5,18: 45.2,19: 53.4,20: 38.1,21: 50.2,22: 25.4,23: 30.0,24: 28.1,25: 49.9,26: 27.5,27: 37.2,28: 27.7,29: 45.7,30: 27.2,31: 30.0,32: 27.9,33: 34.3,34: 29.5,35: 34.5,36: 28.0,37: 33.6,38: 26.8,39: 31.8}, 
    'DAY': {0: 6, 1: 6, 2: 6, 3: 6, 4: 13, 5: 13, 6: 13, 7: 13, 8: 20, 9: 20, 10: 20, 11: 20, 12: 27, 13: 27, 14: 27, 15: 27, 16: 3, 17: 3, 18: 3, 19: 3, 20: 10, 21: 10, 22: 10, 23: 10, 24: 17, 25: 17, 26: 17, 27: 17, 28: 24, 29: 24, 30: 24, 31: 24, 32: 3, 33: 3, 34: 3, 35: 3, 36: 10, 37: 10, 38: 10, 39: 10}, 
    'MONTH': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1, 9: 1, 10: 1, 11: 1, 12: 1, 13: 1, 14: 1, 15: 1, 16: 2, 17: 2, 18: 2, 19: 2, 20: 2, 21: 2, 22: 2, 23: 2, 24: 2, 25: 2, 26: 2, 27: 2, 28: 2, 29: 2, 30: 2, 31: 2, 32: 3, 33: 3, 34: 3, 35: 3, 36: 3, 37: 3, 38: 3, 39: 3}} 


df = pd.DataFrame(data) 

首先,我想借此對mean一天,然後按月。但是,通過將日期分組來計算平均值會導致十進制月份。我想保存幾個月裏,我做了groupby('MONTH').mean()

In [401]: df.groupby("DAY").mean() 
Out[401]: 
     VALVE_SCORE MONTH 
DAY     
3 39.7250 2.5 
6 44.6500 1.0 
10 32.9875 2.5 
13 43.7000 1.0 
17 35.6750 2.0 
20 46.0500 1.0 
24 32.6500 2.0 
27 44.5000 1.0 

之前,我想最終的結果是:

MONTH VALVE_SCORE 
1  value 
2  value 
3  value 
+0

有人可以照顧解釋爲什麼問題得到否定嗎? – Rohit

回答

0

考慮你所擁有的數據,你想擁有日均然後月平均。在Excel數據透視表將同樣會導致這樣的:

enter image description here

待辦事項做的大熊貓一樣,按月分組就足以得到相同的結果:

df.groupby(['MONTH']).mean() 
     DAY VALVE_SCORE 
MONTH 
1  16.5  44.7250 
2  13.5  38.0375 
3  6.5  30.8000 

由於一個月和天值是數值,熊貓處理它,認爲「天」和「月」值不是數字和是字符串,你會得到這樣的結果:

 VALVE_SCORE 
MONTH 
1   44.7250 
2   38.0375 
3   30.8000 

因此p andas已經計算出日常的手段並使用它來計算每月的手段。

0

這裏是一個可能的解決方案。請讓我知道是否有更有效的方法。

df = pd.DataFrame(data) 

months = list(df['MONTH'].unique()) 

frames = [] 
for p in months: 
    df_part = df[df['MONTH'] == p] 
    df_part_avg = df_part.groupby("DAY", as_index=False).mean() 
    df_part_avg = df_part_avg.drop('DAY', axis=1) 
    frames.append(df_part_avg) 

df_months = pd.concat(frames) 
df_final = df_months.groupby("MONTH", as_index=False).mean() 

,其結果是:

In [430]: df_final 
Out[430]: 
    MONTH VALVE_SCORE 
0  1  44.7250 
1  2  38.0375 
2  3  30.8000