Python的IF語句與nltk.wordnet.synsets

import nltk 
from nltk import * 
from nltk.corpus import wordnet as wn 

output=[] 
wordlist=[] 

entries = nltk.corpus.cmudict.entries() 

for entry in entries[:200]: #create a list of words, without the pronounciation since.pos_tag only works with a list 
    wordlist.append(entry[0]) 

for word in nltk.pos_tag(wordlist): #create a list of nouns 
    if(word[1]=='NN'): 
     output.append(word[0]) 

for word in output: 
    x = wn.synsets(word) #remove all words which does not have synsets (this is the problem) 
    if len(x)<1: 
     output.remove(word) 

for word in output[:200]: 
    print (word," ",len(wn.synsets(word)))

我想刪除所有單詞沒有synsets，但由於某種原因，它不工作。在運行程序時，我發現即使一個單詞有len（wn.synsets（word））= 0，它也不會從我的列表中刪除。有人可以告訴我哪裏出了問題？Python的IF語句與nltk.wordnet.synsets

來源

2013-03-12 watisit

您無法遍歷列表並同時刪除當前項目。這裏是演示了該問題的玩具例子：

In [73]: output = range(10) 

In [74]: for item in output: 
    ....:  output.remove(item)

您可能希望在output所有的項目將被刪除。但他們中的一半，而不是仍然存在：

In [75]: output 
Out[75]: [1, 3, 5, 7, 9]

爲什麼你不能循環，並在同一時間刪除：

使用內部計數器記住當前的指數想像的Python因爲它通過for-loop。

當計數器等於0（通過循環第一次），Python的執行

output.remove(item)

精細。 output現在只有一個項目。但是Python隨後將計數器遞增爲1.所以下一個字的值是output[1], 這就是原始列表中的第三個項目。

0 <-- first item removed 
1 <-- the new output[0] ** THIS ONE GETS SKIPPED ** 
2 <-- the new output[1] -- gets removed on the next iteration

的（解決方法）的解決方案：

相反，無論是遍歷一個副本output，或者建立一個新的列表。在這種情況下，我認爲建立一個新清單更有效率：

new_output = [] 
for word in output: 
    x = wn.synsets(word) 
    if len(x)>=1: 
     new_output.append(word)

來源

2013-03-12 16:40:24 unutbu

Python的IF語句與nltk.wordnet.synsets

回答

相關問題