如何從中刪除重複項？

我寫這樣的值到一個文件：如何從中刪除重複項？

(u'fresh', 4.3557350075853982) 
(u'fresh', 6.6629604801359461) 
(u'focus', 6.4398169288217364) 


p= zip(vectorizer.get_feature_names(), idf)  #('object' , score) 
with codecs.open("scores.txt","a") as t: 
     for x in p: 
      if not x: 
       print>>t, x

我所試圖做的是被寫入文件刪除重複。所以我試圖檢查x是否已經存在，而不是再次寫入。考慮到它是一個壓縮組件，我該怎麼做。

來源

2016-03-15 minks

嘗試......

"""(u'fresh', 4.3557350075853982) 
(u'fresh', 6.6629604801359461) 
(u'focus', 6.4398169288217364)""" 


p= zip(vectorizer.get_feature_names(), idf)  #('object' , score) 
outputted = set() 
with codecs.open("scores.txt","a") as t: 
     for x in p: 
      if x[0] in outputted: continue 
      print>>t, x 
      outputted.add(x[0])

這是假設你想刪除重複的只是「對象」 - 如果你考慮重複的（對象，分數）對都完全一樣，更換x[0] from x

本質上，這是在跟蹤您已經輸出的對象/分數，並且不會重新輸出先前爲對象輸出的對象/分數。

來源

2016-03-15 15:56:23

嗨，我嘗試了兩種方法，它似乎還沒有刪除重複的'新鮮' – minks

如何從中刪除重複項？

回答

相關問題