我有一個數據框下面。我想放棄的重複,而是從E
列添加的值複製到非複製的記錄刪除重複並添加值大熊貓
import pandas as pd
import numpy as np
dfp = pd.DataFrame({'A' : [np.NaN,np.NaN,3,4,5,5,3,1,6,7],
'B' : [1,1,3,5,0,0,np.NaN,9,0,0],
'C' : ['AA1233445','AA1233445', 'rmacy','Idaho Rx','Ab123455','TV192837','RX','Ohio Drugs','RX12345','USA Pharma'],
'D' : [123456,123456,1234567,12345678,12345,12345,12345678,123456789,1234567,np.NaN],
'E' : ['Assign','Allign','Hello','Ugly','Appreciate','Undo','Testing','Unicycle','Pharma','Unicorn',]})
print(dfp)
我抓住所有的副本:
df2 = dfp.loc[(dfp['A'].duplicated(keep=False))].copy()
A B C D E
0 NaN 1.0 AA1233445 123456.0 Assign
1 NaN 1.0 AA1233445 123456.0 Allign
2 3.0 3.0 rmacy 1234567.0 Hello
4 5.0 0.0 Ab123455 12345.0 Appreciate
5 5.0 0.0 TV192837 12345.0 Undo
6 3.0 NaN RX 12345678.0 Testing
,並希望我的結局是:
A B C D E
0 NaN 1.0 AA1233445 123456.0 Assign Allign
2 3.0 3.0 rmacy 1234567.0 Hello Testing
4 5.0 0.0 Ab123455 12345.0 Appreciate Undo
我知道我需要使用dfp.loc[(dfp['A'].duplicated(keep='last'))].copy()
搶第一次出現,但我不能設置E
科拉姆的價值n以包含其他重複的值。
我想我需要嘗試類似:
df3 = dfp.loc[(dfp['A'].duplicated(keep='last'))].copy()
df3['E'] = df3['E'] + dfp.loc[(dfp['A'].duplicated(keep=False).copy()),'E']
,但我的輸出是:
A B C D E
0 NaN 1.0 AA1233445 123456.0 AssignAssign
2 3.0 3.0 rmacy 1234567.0 HelloHello
4 5.0 0.0 Ab123455 12345.0 AppreciateAppreciate
我難倒。我過於複雜嗎?如何獲得我正在查找的輸出,以便稍後刪除除第一個之外的所有副本,但是將「已保存」的值存儲在E
列中?
噢,天哪...也許我沒有過於複雜的事情。哈哈,這似乎很激烈。我得看看這個!一如既往的感謝你! – MattR
多麼優雅的解決方案。 –
@ScottBoston謝謝 – piRSquared