2016-01-23 54 views
2

我有兩個文件和一個有機體列表。第一個文件包含一個表明「家族屬性」的列表,所以有兩列。第二個文件包含'屬種',也是兩列。這兩個文件都符合所有列出物種的屬。我想要使​​用每個文件的Genus合併兩個列表,以便能夠將Family name添加到'Genus物種'中。因此,輸出應該包含'家族屬種'。由於每個名稱之間都有空格,因此我正在使用該空格拆分爲多列。到目前爲止,這是我的代碼:迭代兩個文件,比較行中的匹配字符串,合併匹配的行

with open('FAMILY_GENUS.TXT') as f1, open('GENUS_SPECIES.TXT') as f2: 
    for line1 in f1: 
     line1 = line1.strip() 
     c1 = line1.split(' ') 
     print(line1, end=' ') 
     for line2 in f2: 
      line2 = line2.strip() 
      c2 = line2.split(' ') 
      if line1[1] == line2[0]: 
       print(line2[1], end=' ') 
     print() 

結果輸出僅由兩行組成,而不是整個記錄。我錯過了什麼?

而且,如何將其保存到文件而不是僅在屏幕上打印?

回答

3

這是一種替代解決方案。

f1 = open('fg','r') 
f2 = open('gs','r') 
genera= {} 
for i in f1.readlines(): 
    family,genus = i.strip().split(" ") 
    genera[genus] = family 

for i in f2.readlines(): 
    genus,species = i.strip().split(" ") 
    print(genera[genus], genus,species) 
0

我會先處理這些文件,並得到一個屬到家族的映射,以及它可能包含的多個物種。然後使用該映射將其匹配並打印出來。

genuses = {} 

# Map all genuses to a family 
with open('FAMILY_GENUS.TXT') as f1: 
    for line in f1: 
     family, genus = line.strip().split() 
     genuses.setdefault(genus, {})['family'] = family 

# Map all species to a genus 
with open('GENUS_SPECIES.TXT') as f2: 
    for line in f2: 
     genus, species = line.strip().split() 
     genuses.setdefault(genus, {}).setdefault('species', []).append(species) 

# Go through each genus and create a specie string for 
# each specie it contains. 
species_strings = [] 
for genus, d in genuses.items(): 
    family = d.get('family') 
    species = d.get('species') 
    if family and species: 
     for specie in species: 
      s = '{0} {1} {2}'.format(family, genus, specie) 
      species_strings.append(s) 

# Sort the strings to make the output pretty and print them out. 
species_strings.sort() 
for s in species_strings: 
    print s 
相關問題