2017-07-09 148 views
1

請告訴我如何獲取其中出現HashCode的ImgFileNames 多於一次在Python中。 注意:僅保留第一次出現並刪除剩餘部分,即使該值出現在中間或最後或任何地方。刪除在Pandas數據框中出現多次重複的值

我有一個數據幀象下面這樣:

ImgFileName   HashCodes 
Img_0001 - Copy.tif 162a47470f021a60 
Img_0001.tif  162a47470f021a60 
Img_0002.tif  1b5b5b1aa638dac8 
Img_0003.tif  adadadadadadadad 
Img_0004.tif  adadadadadadadad 
Img_0005 - Copy.tif a5b8648c8c666670 
Img_0005.tif  a5b8648c8c666670 
Img_0006.tif  71b392da6a699392 
Img_0007.tif  71b392da6a699392 
Img_0008.tif  b1b1f2fa6bf97292 
Img_0009.tif  86e82ae4c8b6c9c9 
Img_0010 - Copy.tif 86e8aae4c8b6c9c9 
Img_0010.tif  86e8aae4c8b6c9c9 

而且我想要的輸出如下:

ImgFileName   HashCodes 
Img_0001 - Copy.tif 162a47470f021a60 
Img_0003.tif  adadadadadadadad 
Img_0005 - Copy.tif a5b8648c8c666670 
Img_0006.tif  71b392da6a699392 
Img_0009.tif  86e82ae4c8b6c9c9 
+0

看[pandas.DataFrame.drop_duplicates(https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop_duplicates.html) – tarashypka

回答

1

您需要boolean indexingduplicated - 第一過濾所有的受騙者和第二過濾最後的值的首字母大寫(keep='last'):

df =df[ df.duplicated('HashCodes', keep=False) & df.duplicated('HashCodes')] 
print (df) 
    ImgFileName   HashCodes 
1 Img_0001.tif 162a47470f021a60 
4 Img_0004.tif adadadadadadadad 
6 Img_0005.tif a5b8648c8c666670 
8 Img_0007.tif 71b392da6a699392 
12 Img_0010.tif 86e8aae4c8b6c9c9 

或者:

df =df[ df.duplicated('HashCodes', keep=False) & df.duplicated('HashCodes', keep='last')] 
print (df) 
      ImgFileName   HashCodes 
0 Img_0001 -Copy.tif 162a47470f021a60 
3   Img_0003.tif adadadadadadadad 
5 Img_0005 -Copy.tif a5b8648c8c666670 
7   Img_0006.tif 71b392da6a699392 
11 Img_0010 -Copy.tif 86e8aae4c8b6c9c9 
+0

謝謝你很多jezrael。 –

+0

很高興能幫到你!如果我的回答有幫助,請不要忘記[接受](http://meta.stackexchange.com/a/5235/295067) - 點擊答案旁邊的複選標記('✓')將其從灰色出來填補。謝謝。 – jezrael

相關問題