熊貓標籤重複

鑑於以下數據幀：熊貓標籤重複

import pandas as pd 
d=pd.DataFrame({'label':[1,2,2,2,3,4,4], 
       'values':[3,5,7,2,5,8,3]}) 
d 
    label values 
0  1  3 
1  2  5 
2  2  7 
3  2  2 
4  3  5 
5  4  8 
6  4  3

我知道如何計算這樣的獨特的價值觀：

d['dup']=d.groupby('label')['label'].transform('count')

導致：

label values dup 
0  1  3  1 
1  2  5  3 
2  2  7  3 
3  2  2  3 
4  3  5  1 
5  4  8  2 
6  4  3  2

但什麼我想是一列有以下值： 1如果有如果存在duplicates並且所討論的行是first這樣的話，並且0如果該行是原始的duplicate，則每個標籤列爲1 unique行，2。像這樣：

label values dup status 
0  1  3  1  1 
1  2  5  3  2 
2  2  7  3  0 
3  2  2  3  0 
4  3  5  1  1 
5  4  8  2  2 
6  4  3  2  0

在此先感謝！

來源

2016-09-22 Dance Party2

我認爲你可以使用loc與功能duplicated創建條件：

d['status'] = 2 
d.loc[d.dup == 1, 'status'] = 1 
d.loc[d.label.duplicated(), 'status'] = 0 
print (d) 

    label values dup status 
0  1  3 1  1 
1  2  5 3  2 
2  2  7 3  0 
3  2  2 3  0 
4  3  5 1  1 
5  4  8 2  2 
6  4  3 2  0

或者雙numpy.where：

d['status1'] = np.where(d.dup == 1, 1, 
       np.where(d.label.duplicated(), 0, 2)) 

print (d) 
    label values dup status status1 
0  1  3 1  1  1 
1  2  5 3  2  2 
2  2  7 3  0  0 
3  2  2 3  0  0 
4  3  5 1  1  1 
5  4  8 2  2  2 
6  4  3 2  0  0

來源

2016-09-22 16:05:58 jezrael

我喜歡雙'where'，你有我的一票:) – IanS

@IanS - 謝謝你。 ;） – jezrael

完美。謝謝！ –

另一種選擇是在2到剪輯數列，然後減去客場2次duplicated。由於duplicated使用keep='first'作爲默認值，因此除第一個重複標籤外，其他所有標籤都將減少爲零。

d['status'] = d['dup'].clip_upper(2) - 2*d.duplicated(subset='label')

輸出結果：

label values dup status 
0  1  3 1  1 
1  2  5 3  2 
2  2  7 3  0 
3  2  2 3  0 
4  3  5 1  1 
5  4  8 2  2 
6  4  3 2  0

來源

2016-09-22 16:41:04 root

熊貓標籤重複

回答

相關問題