2014-12-23 41 views
-1

我有這樣一個輸入文件:Python文件讀取和列表索引超出範圍錯誤

COG1:aomo|At1g01190|aomo|At1g01280|aomo|At1g11600|homo|Hs10834998|homo|Hs13699816 
COG2:aomo|At1g04160|somo|YAL029c|somo|YOR326w|homo|Hs10835119|aomo|At1g10260 
COG3:somo|YAR009c|somo|YJL113w|aomo|At1g10260|aomo|At1g11265 

從這個,我想要一個簡單的計數和產生這樣一個輸出文件:

 aomo | homo | somo 
COG1 3 | 2 | 0 
COG2 2 | 1 | 2 
COG3 2 | 0 | 2 

對於這一點,我使用:

import re 
    l=[] 
    dict={} 
    with open("groups.txt","r") as f: 
    for line in f: 
     items=line.split(":") 
     key=items[0] 
     if key not in dict: 
     dict[key]={} 
     string=items[1] 
     words=re.findall("\S+\|\S+",string) 
     for w in words: 
     tmp=w.split("|") 
     if tmp[0] not in l: 
     l.append(tmp[0]) 
     if tmp[0] in dict[key]: 
     dict[key][tmp[0]]=1+dict[key][tmp[0]] 
     else: 
     dict[key][tmp[0]]=1 
    for i in sorted(l): 
    print(i,end=" ") 
    print("") 
    for k in sorted(dict.keys()): 
    print(k,end=" ") 
    for i in sorted(l): 
     if i in dict[k]: 
     print(dict[k][i],end=" ") 
     else: 
     print("0", end=" ") 
    print("") 

它工作正常..但是當我改變像輸入文件:

COG1:aomo_At1g01190|aomo_At1g01280|aomo_At1g11600|homo_Hs10834998|homo_Hs13699816 
COG2:aomo_At1g04160|somo_YAL029c|somo_YOR326w|homo_Hs10835119 
COG3:somo_YAR009c|somo_YJL113w|aomo_At1g10260|aomo_At1g11265 

,並更改代碼:

words=re.findall("\S+\_\S+",string) 
for w in words: 
    tmp=w.split("_") 

它提供了以下錯誤:

File "my_program.py", line 10, in (module)   
string=items[1] 
IndexError: list index out of range 
+2

修復代碼上的縮進。如果直接從問題中複製/粘貼,它會觸發一個IndentationError,即使我解決了這個問題,第一個示例的行爲也不像描述的那樣。爲了調試這個特定的錯誤,在'string = items [1]'之前添加一個'print items'行,並且檢查'items'是否是你期望的。 – alexwlchan

+0

你提到你改變了輸入文件。難道是修改後的文件包含一個空行?行爲意外的部分似乎是'items = line.split(「:」)',它沒有找到冒號。 –

回答

0

這是簡單的方法來做到這一點:

>>> my_string = "COG1: aomo|At1g01190 aomo|At1g01280 aomo|At1g11600 homo|Hs10834998 homo|Hs13699816 " 
>>> a,b = my_string.split(":") # will split strings on ":" 
>>> a 
'COG1' 
>>> b 
' aomo|At1g01190 aomo|At1g01280 aomo|At1g11600 homo|Hs10834998 homo|Hs13699816 ' 
>>> import re 
>>> from collections import Counter 
>>> my_count = Counter(re.findall("aomo|homo|somo",b)) # findall will find all, and Counter will give dictionary of for count of each element 
>>> my_count 
Counter({'aomo': 3, 'homo': 2}) 
>>> "{} {} {} {}".format(a,my_count.get('aomo',0),my_count.get('homo',0),my_count.get('somo',0)) 
'COG1 3 2 0' 
1

你可以做到這一點而無需使用全功能的re模塊。

template = '{0:4} {1:4} | {2:4} | {3:4}' 
columns = ['aomo', 'homo', 'somo'] 

with open('groups.txt') as f: 
    print template.format(' ', *columns) 
    for line in f: 
     key, value = line.split(':') 
     counts = [value.count(column_label) for column_label in columns] 
     print template.format(key.strip(), *counts) 
0

可能是第二個文件中的一些空行。所以當分割時,它將有長度爲1 >> ['']的列表。並在訪問列表[1]時會引起索引錯誤。