2013-07-11 59 views
0

所以我有一個列表L列表,我正試圖過濾出重複項。現在我意識到這不是實現這一目標的最佳方法,而是它特別要求的方法。我沒有以重複的數據結束,但我最終得到了重複的空列,我無法修復,有任何幫助?篩選出嵌套列表中的空條目

for x in range(len(L), 0, -1): 
x -= 1 #len() 
for y in range(len(L[0]), 0, -1): 
    y -= 1 
    if y != 0 and y != 1: #Skiping Coloumns 0 and 1 
     check = L[x][y] 
     for x0 in range(len(L), 0, -1): 
      x0 -= 1 
      for y0 in range(len(L[0]), 0, -1): 
       y0 -= 1 
       if y0 == y: 
        checkagainst = L[x0][y0] 
        if check == checkagainst: 
         if x != x0: #If its on the same row, don't count bro 
          #print "Identical Indices:","X0:",x0,",","Y0:", y0,"|" ,"X:",x,",","Y:",y 
          #print L[x][y], "," , L[x0][y0] 
          WriteMe = True #Write to Not Duplicate file or not decider 
          if check == "": ##Didnt work 
           WriteMe = False 
     print x, ",", y 
if WriteMe == True: 
    dwriter.writerow(L[x]) 
    WriteMe = False #Set to False for next iteration 
else: 
    writer.writerow(L[x]) 
L.pop(x) 
print 

樣品輸入:

ID, Sex, E-mail 

1, M, [email protected] 

2, F, 

3, F, 

4, F, [email protected] 

期望輸出(無重複的文件):

Id, Sex, E-mail 

1, M, [email protected] 

2, F, 

4, [email protected] 

(ID 2和ID 3是在這種情況下可以互換,因爲它們是重複的行)預計產量(重複檔案):

ID, Sex, E-mail 

3, F, 
+1

後的樣本輸入和預期的輸出。 –

+1

他們是否重複,如果他們有相同的電子郵件,但性別不同? – 2rs2ts

+0

@ 2rs2ts不,我只是過濾電子郵件。只要他們有不同的電子郵件地址,他們不會重複。 – user1861156

回答

0

您可以使用collections.OrderedDict

>>> from collections import OrderedDict 
with open('abc') as f: 
    #next(f)  #skip header if present 
    for line in f: 
     data = map(str.strip, line.split(', ')) 
     idx, sex, mail = data if len(data) == 3 else data+[''] 
     dic.setdefault(mail,[]).append([idx,sex]) 
...  

非重複:

for k,v in dic.iteritems(): 
    print ", ".join((v[0][0],v[0][1],k)) 
...  
1, M, [email protected] 
2, F, 
4, F, [email protected] 

重複:

for k,v in dic.iteritems(): 
    if len(v) >1: 
     for v1 in v[1:]: 
      print ", ".join((v1[0],v1[1],k)) 
...    
3, F,, 
+0

我認爲OP想要3在單獨的文件中,而不是2.很容易檢查'如果郵件在dic'中並且將它放在列表中,如果是的話。 – 2rs2ts

+0

@ 2rs2ts OP說'id2'和'id3'可以互換,順便說一句,我已經更新了我的解決方案。 –

+0

沒什麼大不了的。模糊的問題在這裏是相同的。 – 2rs2ts