如何刪除基於在熊貓數據幀兩個或兩個以上的子集標準重複

df = pd.DataFrame({ 'bio' : ['1', '1', '1', '4'], 
       'center' : ['one', 'one', 'two', 'three'], 
       'outcome' : ['f','t','f','f'] })

它看起來像這樣...

bio center outcome 
0 1 one  f 
1 1 one  t 
2 1 two  f 
3 4 three  f

我想刪除第1行，因爲它具有與第0行相同的生物&中心。我想保留第2行，因爲它具有相同的生物但不同的中心，然後是第0行。

類似這將無法工作基於drop_duplicates輸入結構，但這正是我想要做的

df.drop_duplicates(subset = 'bio' & subset = 'center')

有什麼建議嗎？

編輯：改變DF一點點正確的答案，以適應例如

2017-08-04 logic8

你的語法是錯誤的。下面是正確的方法：

df.drop_duplicates(subset=['bio', 'center', 'outcome'])

或在這種特定的情況下，只是簡單地說：

df.drop_duplicates()

都返回如下：

bio center outcome 
0 1 one  f 
2 1 two  f 
3 4 three  f

在df.drop_duplicatesdocumentation語法細節請看。 subset應該是一列列標籤。

2017-08-04 03:40:16

好點。我忽略了「子集」的定義。只是在一個簡單的問題上浪費了一個小時:) – logic8

回答