import numpy as np
import pandas as pd
NaN = np.nan
df = pd.DataFrame(
{'a':['y',NaN,'y',NaN,NaN,'x','x','y',NaN],
'b':[NaN,'x',NaN,'y','x',NaN,NaN,NaN,'y'],
'd':[32,12,55,98,23,11,9,91,3]})
melted = pd.melt(df, id_vars=['d'], value_vars=['a', 'b'])
result = pd.pivot_table(melted, values='d', index=['value'], columns=['variable'],
aggfunc=np.median)
print(result)
產量
variable a b
value
x 10.0 17.5
y 55.0 50.5
說明:
Melting the DataFrame與melted = pd.melt(df, value_vars=['a', 'b'])
產生
d variable value
0 32 a y
1 12 a NaN
2 55 a y
3 98 a NaN
4 23 a NaN
5 11 a x
6 9 a x
7 91 a y
8 3 a NaN
9 32 b NaN
10 12 b x
11 55 b NaN
12 98 b y
13 23 b x
14 11 b NaN
15 9 b NaN
16 91 b NaN
17 3 b y
,現在我們可以用pd.pivot_table
轉動和聚集d
值:
result = pd.pivot_table(melted, values='d', index=['value'], columns=['variable'],
aggfunc=np.median)
注意,aggfunc
可以採取的功能列表,如[np.sum, np.median, np.min, np.max, np.std]
如果你想總結的方法不止一種數據。
@unutbu:對不起,你是對的,第二行應該是y 2 1.我編輯過它。 – HappyPy