2011-02-28 108 views
1

試圖編寫簡單的python腳本,它將使用NLTK在txt文件中查找和替換同義詞。使用WordNet和NLTK替換語料庫中的同義詞 - python

下面的代碼給我錯誤:

Traceback (most recent call last): 
    File "C:\Users\Nedim\Documents\sinon2.py", line 21, in <module> 
    change(word) 
    File "C:\Users\Nedim\Documents\sinon2.py", line 4, in change 
    synonym = wn.synset(word + ".n.01").lemma_names 
TypeError: can only concatenate list (not "str") to list 

這裏是代碼:

from nltk.corpus import wordnet as wn 

def change(word): 
    synonym = wn.synset(word + ".n.01").lemma_names 

    if word in synonym: 

      filename = open("C:/Users/tester/Desktop/test.txt").read() 
      writeSynonym = filename.replace(str(word), str(synonym[0])) 
      f = open("C:/Users/tester/Desktop/test.txt", 'w') 
      f.write(writeSynonym) 
      f.close() 

f = open("C:/Users/tester/Desktop/test.txt") 
lines = f.readlines() 

for i in range(len(lines)): 

    word = lines[i].split() 
    change(word) 

回答

1

兩件事情。首先,你可以更改文件的閱讀部分:

for line in open("C:/Users/tester/Desktop/test.txt"): 
    word = line.split() 

第二,.split()返回一個字符串列表,而你的change功能出現在時間上一個字,只運行。這是什麼導致異常。您的word實際上是一個列表。

如果你想在該行的每一個字處理,使其看起來像:

for line in open("C:/Users/tester/Desktop/test.txt"): 
    words = line.split() 
    for word in words: 
     change(word) 
+0

這是行不通的。不知道什麼是錯的 – Tester 2011-03-01 10:25:57

2

這並不十分有效,而這不會取代單一的代名詞。因爲每個單詞可能有多個同義詞。你可以從中選擇,

from nltk.corpus import wordnet as wn 
from nltk.corpus.reader.plaintext import PlaintextCorpusReader 


corpus_root = 'C://Users//tester//Desktop//' 
wordlists = PlaintextCorpusReader(corpus_root, '.*') 


for word in wordlists.words('test.txt'): 
    synonymList = set() 
    wordNetSynset = wn.synsets(word) 
    for synSet in wordNetSynset: 
     for synWords in synSet.lemma_names: 
      synonymList.add(synWords) 
    print synonymList 
相關問題