pd.melt
可以將多個列合併爲一個值列(和一個可變列)。你可以使用它曾經凝聚了num1
和num2
列,和第二次聚結phone1
和phone2
列:
import pandas as pd
df = pd.DataFrame({'phone1':[4567890876, 4567890876, 9178889999, 3237800876],
'phone2':[4567890876, 4567890876, 9178889999, 2139990000],
'num1':[1,2,3,3],
'num2':[5,2,3,1]})
melted = pd.melt(df, id_vars=['phone1', 'phone2'], var_name='numvar', value_name='num')
melted = pd.melt(melted, id_vars=['numvar', 'num'], value_name='phone')
melted = melted[['num', 'phone']]
melted = melted.drop_duplicates()
print(melted)
產生
num phone
0 1 4567890876
1 2 4567890876
2 3 9178889999
3 3 3237800876
4 5 4567890876
7 1 3237800876
11 3 2139990000
15 1 2139990000
說明:使用id_vars
到防止phone1
和phone2
色譜柱熔化。下面顯示熔化num1
和num2
列結果:
In [166]: melted = pd.melt(df, id_vars=['phone1', 'phone2'], var_name='numvar', value_name='num'); melted
Out[166]:
phone1 phone2 numvar num
0 4567890876 4567890876 num1 1
1 4567890876 4567890876 num1 2
2 9178889999 9178889999 num1 3
3 3237800876 2139990000 num1 3
4 4567890876 4567890876 num2 5
5 4567890876 4567890876 num2 2
6 9178889999 9178889999 num2 3
7 3237800876 2139990000 num2 1
然後再次申請pd.melt
到phone1
和phone2
列合併爲一個:
In [168]: pd.melt(melted, id_vars=['numvar', 'num'], value_name='phone')
Out[168]:
numvar num variable phone
0 num1 1 phone1 4567890876
1 num1 2 phone1 4567890876
2 num1 3 phone1 9178889999
3 num1 3 phone1 3237800876
4 num2 5 phone1 4567890876
5 num2 2 phone1 4567890876
6 num2 3 phone1 9178889999
7 num2 1 phone1 3237800876
8 num1 1 phone2 4567890876
9 num1 2 phone2 4567890876
10 num1 3 phone2 9178889999
11 num1 3 phone2 2139990000
12 num2 5 phone2 4567890876
13 num2 2 phone2 4567890876
14 num2 3 phone2 9178889999
15 num2 1 phone2 2139990000
刪除重複項,並刪除numvar
和variable
列你會得到想要的結果(儘管順序不同)。
爲什麼'2139990000'和'3237800876'在結果DF中出現兩次? – MaxU