2015-09-03 134 views
-1

我希望通過指定特定的列來刪除重複的條目。 列標記爲 'sent_name'熊貓drop_duplicates問題

print(new_df) 

            sent_name \ 
0   Abbey Road Station, London, UK 
1   Abbey Wood Station, London, UK 
2     Acton Station, London, UK 
3   Acton Central Station, London, UK 


               Name  Lat  Lng \ 
0       Abbey Road, London E15, UK 51.531930 0.003760 
1       Abbey Wood, London SE2, UK 51.491060 0.121420 
2  Station Parade, West Acton London Underground ... 51.518055 -0.281053 
3       Acton Central, London W3, UK 51.508720 -0.262950 

                type 
0  [u'transit_station', u'point_of_interest', u'e... 
1  [u'transit_station', u'point_of_interest', u'e... 
2  [u'train_station', u'transit_station', u'point... 
3  [u'transit_station', u'point_of_interest', u'e... 

我試圖

new_df.drop_duplicates(["sent_name"]) 

new_df.drop_duplicates(subset="sent_name") 

在檢查時,這些nither刪除所有重複的。

例如,

1038   Woodford Station, London, UK 
1040   Woodford Station, London, UK 
1041   Woodford Station, London, UK 
1043   Woodford Station, London, UK 
1044   Woodford Station, London, UK 
1038 South Woodford London Underground Station, Geo... 51.591789 0.027315 
1040 Woodford, Woodford, Woodford Green, Greater Lo... 51.606900 0.034000 
1041      South Woodford, London E18, UK 51.591910 0.027360 
1043   South Woodford (Stop C), London E18, UK 51.591312 0.029013 
1044   South Woodford (Stop D), London E18, UK 51.592010 0.027658 
1038 [u'train_station', u'transit_station', u'point... 
1040 [u'transit_station', u'point_of_interest', u'e... 
1041 [u'transit_station', u'point_of_interest', u'e... 
1043 [u'transit_station', u'point_of_interest', u'e... 
1044 [u'transit_station', u'point_of_interest', u'e... 
+1

你分配結果回來? 'new_df = new_df.drop_duplicates([「sent_name」])'默認情況下,除非通過參數'inplace = True',否則返回修改的df的副本,請參閱[docs](http://pandas.pydata.org/ pandas-docs/stable/generated/pandas.DataFrame.drop_duplicates.html#pandas.DataFrame.drop_duplicates) – EdChum

+0

我欠你一分! – LearningSlowly

回答

1

您需要的drop_duplicates結果作爲分配默認inplace=False和幾乎所有的大熊貓OPS返回副本。

因此,要麼:

new_df = new_df.drop_duplicates(["sent_name"]) 

new_df.drop_duplicates(["sent_name"], inplace=True) 

將工作