Python文件讀取和列表索引超出範圍錯誤

-1

COG1:aomo|At1g01190|aomo|At1g01280|aomo|At1g11600|homo|Hs10834998|homo|Hs13699816 
COG2:aomo|At1g04160|somo|YAL029c|somo|YOR326w|homo|Hs10835119|aomo|At1g10260 
COG3:somo|YAR009c|somo|YJL113w|aomo|At1g10260|aomo|At1g11265

從這個，我想要一個簡單的計數和產生這樣一個輸出文件：

 aomo | homo | somo 
COG1 3 | 2 | 0 
COG2 2 | 1 | 2 
COG3 2 | 0 | 2

對於這一點，我使用：

import re 
    l=[] 
    dict={} 
    with open("groups.txt","r") as f: 
    for line in f: 
     items=line.split(":") 
     key=items[0] 
     if key not in dict: 
     dict[key]={} 
     string=items[1] 
     words=re.findall("\S+\|\S+",string) 
     for w in words: 
     tmp=w.split("|") 
     if tmp[0] not in l: 
     l.append(tmp[0]) 
     if tmp[0] in dict[key]: 
     dict[key][tmp[0]]=1+dict[key][tmp[0]] 
     else: 
     dict[key][tmp[0]]=1 
    for i in sorted(l): 
    print(i,end=" ") 
    print("") 
    for k in sorted(dict.keys()): 
    print(k,end=" ") 
    for i in sorted(l): 
     if i in dict[k]: 
     print(dict[k][i],end=" ") 
     else: 
     print("0", end=" ") 
    print("")

它工作正常..但是當我改變像輸入文件：

COG1:aomo_At1g01190|aomo_At1g01280|aomo_At1g11600|homo_Hs10834998|homo_Hs13699816 
COG2:aomo_At1g04160|somo_YAL029c|somo_YOR326w|homo_Hs10835119 
COG3:somo_YAR009c|somo_YJL113w|aomo_At1g10260|aomo_At1g11265

，並更改代碼：

words=re.findall("\S+\_\S+",string) 
for w in words: 
    tmp=w.split("_")

它提供了以下錯誤：

File "my_program.py", line 10, in (module)   
string=items[1] 
IndexError: list index out of range

來源

2014-12-23 user2300042

修復代碼上的縮進。如果直接從問題中複製/粘貼，它會觸發一個IndentationError，即使我解決了這個問題，第一個示例的行爲也不像描述的那樣。爲了調試這個特定的錯誤，在'string = items [1]'之前添加一個'print items'行，並且檢查'items'是否是你期望的。 – alexwlchan

你提到你改變了輸入文件。難道是修改後的文件包含一個空行？行爲意外的部分似乎是'items = line.split（「：」）'，它沒有找到冒號。 –

這是簡單的方法來做到這一點：

>>> my_string = "COG1: aomo|At1g01190 aomo|At1g01280 aomo|At1g11600 homo|Hs10834998 homo|Hs13699816 " 
>>> a,b = my_string.split(":") # will split strings on ":" 
>>> a 
'COG1' 
>>> b 
' aomo|At1g01190 aomo|At1g01280 aomo|At1g11600 homo|Hs10834998 homo|Hs13699816 ' 
>>> import re 
>>> from collections import Counter 
>>> my_count = Counter(re.findall("aomo|homo|somo",b)) # findall will find all, and Counter will give dictionary of for count of each element 
>>> my_count 
Counter({'aomo': 3, 'homo': 2}) 
>>> "{} {} {} {}".format(a,my_count.get('aomo',0),my_count.get('homo',0),my_count.get('somo',0)) 
'COG1 3 2 0'

來源

2014-12-23 08:04:16 Hackaholic

你可以做到這一點而無需使用全功能的re模塊。

template = '{0:4} {1:4} | {2:4} | {3:4}' 
columns = ['aomo', 'homo', 'somo'] 

with open('groups.txt') as f: 
    print template.format(' ', *columns) 
    for line in f: 
     key, value = line.split(':') 
     counts = [value.count(column_label) for column_label in columns] 
     print template.format(key.strip(), *counts)

來源

2014-12-23 09:02:58 dopstar

可能是第二個文件中的一些空行。所以當分割時，它將有長度爲1 >> ['']的列表。並在訪問列表[1]時會引起索引錯誤。

來源

2014-12-23 12:47:38 vijay

Python文件讀取和列表索引超出範圍錯誤

回答

相關問題