看來你需要groupby
與參數level
:
grouped = df.groupby(['A','B'])['C'].sum().groupby(level='B').std()
樣品:
np.random.seed(100)
df = pd.DataFrame(np.random.randint(5, size=(10,3)), columns=list('ABC'))
print (df)
A B C
0 0 0 3
1 0 2 4
2 2 2 2
3 2 1 0
4 0 4 3
5 4 2 0
6 3 1 2
7 3 4 4
8 1 3 4
9 4 3 3
grouped = df.groupby(['A','B'])['C'].sum().groupby(level='B').std().reset_index()
print (grouped)
B C
0 0 NaN
1 1 1.414214
2 2 2.000000
3 3 0.707107
4 4 0.707107
grouped = df.groupby(['A','B'])['C'].sum().groupby(level=1).std().reset_index()
print (grouped)
B C
0 0 NaN
1 1 1.414214
2 2 2.000000
3 3 0.707107
4 4 0.707107
解釋,當事方每:
#groupby by columns A, B, aggregate column C
#->output is Series with MultiIndex
grouped1 = df.groupby(['A','B'])['C'].sum()
print (grouped1)
A B
0 0 3
2 4
4 3
1 3 4
2 1 0
2 2
3 1 2
4 4
4 2 0
3 3
Name: C, dtype: int32
print (type(grouped1))
<class 'pandas.core.series.Series'>
print (grouped1.index)
MultiIndex(levels=[[0, 1, 2, 3, 4], [0, 1, 2, 3, 4]],
labels=[[0, 0, 0, 1, 2, 2, 3, 3, 4, 4], [0, 2, 4, 3, 1, 2, 1, 4, 2, 3]],
names=['A', 'B'])
#groupby by level B of MultiIndex
#->output is Series with MultiIndex, so reset_index for df
grouped = grouped1.groupby(level='B').std().reset_index()
print (grouped)
B C
0 0 NaN
1 1 1.414214
2 2 2.000000
3 3 0.707107
4 4 0.707107
#all together
grouped = df.groupby(['A','B'])['C'].sum().groupby(level='B').std().reset_index()
print (grouped)
B C
0 0 NaN
1 1 1.414214
2 2 2.000000
3 3 0.707107
4 4 0.707107
謝謝,這確實有效。任何解釋爲什麼? – weymouth
當然,給我一下。 – jezrael
請檢查我最後的編輯,如果不清楚,請告訴我,我嘗試解釋它。 – jezrael