我有一個數據幀是從groupby
調用結果獲取從Python的數據幀基於指數
test=uniqueStudents.groupby(['index1','index2']).count()
test.head(10)
我期待在那裏我發現整個索引1
計數輸出的平均獲得一個總平均值結果和期望的輸出示於下
電流/所需的輸出繼電器:
有人可以幫我用python代碼來實現這個嗎?或者還有其他方法可以從數據集中獲取嗎?
我有一個數據幀是從groupby
調用結果獲取從Python的數據幀基於指數
test=uniqueStudents.groupby(['index1','index2']).count()
test.head(10)
我期待在那裏我發現整個索引1
計數輸出的平均獲得一個總平均值結果和期望的輸出示於下
電流/所需的輸出繼電器:
有人可以幫我用python代碼來實現這個嗎?或者還有其他方法可以從數據集中獲取嗎?
在groupby
方法中使用level
參數,該方法可以採用索引的名稱。
test.groupby(level='index1').mean()
此外,您可以重置指數和做的by
參數正常GROUPBY。
test.reset_index().groupby('index1').mean()
您需要通過index1
水平groupby
和總GroupBy.mean
,然後按列得到DataFrame.mean
:
test = pd.DataFrame({'column4': {('01-06-15', 278658): 22, ('01-06-15', 206905): 101, ('02-06-15', 225800): 308, ('02-06-15', 225596): 19, ('01-06-15', 152551): 64, ('01-06-15', 124337): 54, ('02-06-15', 235369): 7, ('01-06-15', 31883): 124, ('03-06-15', 124337): 17}, 'column10': {('01-06-15', 278658): 17.0, ('01-06-15', 206905): 60.0, ('02-06-15', 225800): 280.0, ('02-06-15', 225596): 15.0, ('01-06-15', 152551): 55.0, ('01-06-15', 124337): 21.0, ('02-06-15', 235369): 3.0, ('01-06-15', 31883): 62.0, ('03-06-15', 124337): np.nan}, 'column3': {('01-06-15', 278658): 22, ('01-06-15', 206905): 101, ('02-06-15', 225800): 308, ('02-06-15', 225596): 19, ('01-06-15', 152551): 64, ('01-06-15', 124337): 54, ('02-06-15', 235369): 7, ('01-06-15', 31883): 124, ('03-06-15', 124337): 17}, 'column8': {('01-06-15', 278658): 17.0, ('01-06-15', 206905): 60.0, ('02-06-15', 225800): 280.0, ('02-06-15', 225596): 15.0, ('01-06-15', 152551): 55.0, ('01-06-15', 124337): 21.0, ('02-06-15', 235369): 3.0, ('01-06-15', 31883): 62.0, ('03-06-15', 124337): np.nan}, 'column11': {('01-06-15', 278658): 22.0, ('01-06-15', 206905): 101.0, ('02-06-15', 225800): 308.0, ('02-06-15', 225596): 19.0, ('01-06-15', 152551): 64.0, ('01-06-15', 124337): 54.0, ('02-06-15', 235369): 7.0, ('01-06-15', 31883): 124.0, ('03-06-15', 124337): np.nan}, 'column5': {('01-06-15', 278658): 22, ('01-06-15', 206905): 101, ('02-06-15', 225800): 308, ('02-06-15', 225596): 19, ('01-06-15', 152551): 64, ('01-06-15', 124337): 54, ('02-06-15', 235369): 7, ('01-06-15', 31883): 124, ('03-06-15', 124337): 17}, 'column7': {('01-06-15', 278658): 22, ('01-06-15', 206905): 101, ('02-06-15', 225800): 308, ('02-06-15', 225596): 19, ('01-06-15', 152551): 64, ('01-06-15', 124337): 54, ('02-06-15', 235369): 3, ('01-06-15', 31883): 124, ('03-06-15', 124337): 17}, 'column2': {('01-06-15', 278658): 22, ('01-06-15', 206905): 101, ('02-06-15', 225800): 308, ('02-06-15', 225596): 19, ('01-06-15', 152551): 64, ('01-06-15', 124337): 54, ('02-06-15', 235369): 7, ('01-06-15', 31883): 124, ('03-06-15', 124337): 17}, 'column1': {('01-06-15', 278658): 22, ('01-06-15', 206905): 101, ('02-06-15', 225800): 308, ('02-06-15', 225596): 19, ('01-06-15', 152551): 64, ('01-06-15', 124337): 54, ('02-06-15', 235369): 7, ('01-06-15', 31883): 124, ('03-06-15', 124337): 17}, 'column6': {('01-06-15', 278658): 22, ('01-06-15', 206905): 101, ('02-06-15', 225800): 308, ('02-06-15', 225596): 19, ('01-06-15', 152551): 64, ('01-06-15', 124337): 54, ('02-06-15', 235369): 7, ('01-06-15', 31883): 124, ('03-06-15', 124337): 17}, 'column9': {('01-06-15', 278658): 17.0, ('01-06-15', 206905): 60.0, ('02-06-15', 225800): 280.0, ('02-06-15', 225596): 15.0, ('01-06-15', 152551): 55.0, ('01-06-15', 124337): 21.0, ('02-06-15', 235369): 3.0, ('01-06-15', 31883): 62.0, ('03-06-15', 124337): np.nan}})
test.index.names = ['index1','index2']
test = test[['column'+str(col) for col in range(1,12)]]
print (test)
column1 column2 column3 column4 column5 column6 \
index1 index2
01-06-15 31883 124 124 124 124 124 124
124337 54 54 54 54 54 54
152551 64 64 64 64 64 64
206905 101 101 101 101 101 101
278658 22 22 22 22 22 22
02-06-15 225596 19 19 19 19 19 19
225800 308 308 308 308 308 308
235369 7 7 7 7 7 7
03-06-15 124337 17 17 17 17 17 17
column7 column8 column9 column10 column11
index1 index2
01-06-15 31883 124 62.0 62.0 62.0 124.0
124337 54 21.0 21.0 21.0 54.0
152551 64 55.0 55.0 55.0 64.0
206905 101 60.0 60.0 60.0 101.0
278658 22 17.0 17.0 17.0 22.0
02-06-15 225596 19 15.0 15.0 15.0 19.0
225800 308 280.0 280.0 280.0 308.0
235369 3 3.0 3.0 3.0 7.0
03-06-15 124337 17 NaN NaN NaN NaN
df = test.groupby(level='index1').mean().mean(axis=1).reset_index(name='val')
print (df)
index1 val
0 01-06-15 57.818182
1 02-06-15 107.939394
2 03-06-15 17.000000
另一種解決方案是第一mean
按列,然後groupby
:
df = test.mean(axis=1).groupby(level='index1').mean().reset_index(name='val')
print (df)
index1 val
0 01-06-15 57.818182
1 02-06-15 107.939394
2 03-06-15 17.000000