2015-10-19 13 views
3

我想讀取文本文件並從中刪除所有停用詞。但是,使用b[i].pop(j)時,我的索引超出範圍。 但是,如果我使用print(b[i][j]),我沒有得到任何錯誤,並獲得單詞作爲輸出。 任何人都可以發現錯誤?在python中使用彈出窗口獲取索引超出範圍錯誤

import nltk 
from nltk.corpus import stopwords 
stop = stopwords.words('english') 

fo = open("text.txt", "r") 
# text.txt is just a text document 

list = fo.read(); 
list = list.replace("\n","") 
# removing newline character 

b = list.split('.',list.count('.')) 
# splitting list into lines 

for i in range (len(b) - 1) : 
    b[i] = b[i].split() 
# splitting each line into words 

for i in range (0,len(b)) : 
    for j in range (0,len(b[i])) : 
     if b[i][j] in stop : 
      b[i].pop(j) 
#   print(b[i][j]) 
#print(b) 

# Close opend file 
fo.close() 

輸出:

Traceback (most recent call last): 
    File "prog.py", line 29, in <module> 
    if b[i][j] in stop : 
IndexError: list index out of range 

在評論b[i].pop(j)和取消註釋print(b[i][j])輸出:從列表

is 
that 
the 
from 
the 
the 
the 
can 
the 
and 
and 
the 
is 
and 
can 
be 
into 
is 
a 
or 

回答

1

您刪除元素爲你迭代的他們來說,這將導致列表在迭代期間縮小大小,但迭代仍然會繼續原始列表的長度,因此導致這樣的InderError是起訴。

您應該嘗試創建一個僅包含所需元素的新列表。示例 -

result = [] 
for i in range (0,len(b)): 
    templist = [] 
    for j in range (0,len(b[i])): 
     if b[i][j] not in stop : 
      templist.append(b[i][j]) 
    result.append(templist) 

同樣可以在列表理解來完成 -

result = [[word for word in sentence if word not in stop] for sentence in b]