2017-01-25 33 views
0

我輸入文件,查找雙向匹配的ID

ID1 ID2 value 
ID3 ID6 value 
ID2 ID1 value 
ID4 ID5 value 
ID6 ID5 value 
ID5 ID4 value 
ID7 ID2 value 

所需的輸出,FILE1.TXT

ID1 ID2 value ID2 ID1 value 
ID4 ID5 value ID5 ID4 value 

FILE2.TXT

ID3 ID6 value 
ID6 ID5 value 
ID7 ID2 value 

我試圖讓雙dicrectional最佳匹配。如果ID1具有命中ID2,則ID2也具有擊中ID1,打印在file1中,否則打印在file2中。我試圖做的是創建一個輸入文件的副本並創建一個字典。但是這會給出沒有值的輸出(10列)。如何修改它?

fileA = open("input.txt",'r') 
fileB = open("input_copy.txt",'r') 
output = open("out.txt",'w') 

dictA = dict() 
for line1 in fileA: 
    new_list=line1.rstrip('\n').split('\t') 
    query=new_list[0] 
    subject=new_list[1] 
    dictA[query] = subject 
dictB = dict() 
for line1 in fileB: 
    new_list=line1.rstrip('\n').split('\t') 
    query=new_list[0] 
    subject=new_list[1] 
    dictB[query] = subject 
SharedPairs ={} 
NotSharedPairs ={} 
for id1 in dictA.keys(): 
    value1=dictA[id1] 
    if value1 in dictB.keys(): 
     if id1 == dictB[value1]: 
      SharedPairs[value1] = id1 
     else: 
      NotSharedPairs[value1] = id1 
for key in SharedPairs.keys(): 
    ine = key +'\t' + SharedPairs[key]+'\n' 
    output.write(line) 
for key in NotSharedPairs.keys(): 
    line = key +'\t' + NotSharedPairs[key]+'\n' 
    output2.write(line) 

回答

1

您可以使用set s到輕鬆解決它:

#!/usr/bin/env python 

# ordered pairs (ID1, ID2) 
oset = set() 
# reversed pairs (ID2, ID1) 
rset = set() 

with open('input.txt') as f: 
    for line in f: 
     first, second, val = line.strip().split() 
     if first < second: 
      oset.add((first, second, val,)) 
     else: 
      # note that this reverses second and first for matching purposes 
      rset.add((second, first, val,)) 

print "common: %s" % str(oset & rset) 
print "diff: %s" % str(oset^rset) 

輸出:

common: set([('ID4', 'ID5', 'value'), ('ID1', 'ID2', 'value')]) 
diff: set([('ID3', 'ID6', 'value'), ('ID5', 'ID6', 'value'), ('ID2', 'ID7', 'value')]) 

它不處理對與(ID1, ID1)但你可以將其添加到第三組並做你所決定的。

+0

我已經過濾了這些ID。您可以將腳本修改爲以標籤格式保存到txt文件,bcos我擁有數千個ID。非常感謝 – user3224522

+0

使用'csv'模塊。 –

1
import csv 
data = csv.reader(open('data.tsv'), delimiter='\t') 
id_list = [] 
for item in data: 
    (x, y, val) = item 
    id_list.append((x, y, val)) 

file1 = [item for item in id_list if (item[1], item[0], item[2]) in id_list] 
file2 = [item for item in id_list if (item[1], item[0], item[2]) not in id_list] 
print file1 
print file2 
+0

如果有很多項目會比列表做得更好... –