有條件地Python數據框drop_duplicates

我想根據列的值類型刪除數據框的重複行。例如，我的數據幀是：有條件地Python數據框drop_duplicates

A B 
3 4 
3 4 
3 5 
yes 8 
no 8 
yes 8

如果df['A']是一個數字，我想drop_duplicates()。

如果df['A']是一個字符串，我想保留重複。

所以期望的結果將是：

A B 
3 4 
3 5 
yes 8 
no 8 
yes 8

除了使用for循環，有沒有Python化的方式來做到這一點？謝謝！

來源

2015-10-20 datadatadata

創建一個新列C：如果A列是數字，在C分配一個共同的價值，否則C分配一個獨特的價值。

之後，只是drop_duplicates正常。

注意：有一個不錯的isnumeric()方法用於測試一個單元格是否類似數字。

In [47]: 

df['C'] = np.where(df.A.str.isnumeric(), 1, df.index) 
print df 
    A B C 
0 3 4 1 
1 3 4 1 
2 3 5 1 
3 yes 8 3 
4 no 8 4 
5 yes 8 5 
In [48]: 

print df.drop_duplicates()[['A', 'B']] #reset index if needed 
    A B 
0 3 4 
2 3 5 
3 yes 8 
4 no 8 
5 yes 8

來源

2015-10-20 15:05:53

該解決方案更詳細，但可能更多地參與測試變得更加靈活：

def true_if_number(x): 
    try: 
     int(x) 
     return True 
    except ValueError: 
     return False 

rows_numeric = df['A'].apply(true_if_number) 

df['A'][rows_numeric].drop_duplicates().append(df['A'][~rows_numeric])

來源

2015-10-20 15:15:38 ojdo

有條件地Python數據框drop_duplicates

回答

相關問題