2017-06-14 55 views
0

我正在使用Python 3.6和NLTK 3.2.3,並且我僅爲「escort」一詞獲得「WordNetError」。我不會用任何其他詞語獲得錯誤。下面是顯示使用單詞「護送」與「狗」字和錯誤的成功成績單:NLTK WordNet錯誤,使用synsets查找單詞

Python 3.6.0 |Anaconda custom (64-bit)| (default, Dec 23 2016, 12:22:00) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux 
Type "help", "copyright", "credits" or "license" for more information. 
>>> from nltk.corpus import wordnet 
>>> wordnet.synsets('dog') 
[Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'), Synset('cad.n.01'), Synset('frank.n.02'), Synset('pawl.n.01'), Synset('andiron.n.01'), Synset('chase.v.01')] 
>>> wordnet.synsets('escort') 
Traceback (most recent call last): 
    File "/home/user1/.conda/envs/ca/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py", line 1403, in _synset_from_pos_and_line 
    offset = int(_next_token()) 
ValueError: invalid literal for int() with base 10: '02026433\x00v' 

During handling of the above exception, another exception occurred: 

Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
    File "/home/user1/.conda/envs/ca/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py", line 1491, in synsets 
    for p in pos 
    File "/home/user1/.conda/envs/ca/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py", line 1493, in <listcomp> 
    for offset in index[form].get(p, [])] 
    File "/home/user1/.conda/envs/ca/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py", line 1335, in synset_from_pos_and_offset 
    synset = self._synset_from_pos_and_line(pos, data_file_line) 
    File "/home/user1/.conda/envs/ca/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py", line 1448, in _synset_from_pos_and_line 
    raise WordNetError('line %r: %s' % (data_file_line, e)) 
nltk.corpus.reader.wordnet.WordNetError: line '02025829 38 v 01 escort 0 006 @ 02025550 v 0000 + 09992538 n 0102 ~ 02026203 v 0000 ~ 02026327 v 0000 ~ 02026433\x00v 0000 ~ 02026712 v 0000 04 + 08 00 + 09 00 + 20 00 + 21 00 | accompany as an escort; "She asked her older brother to escort her to the ball" \n': invalid literal for int() with base 10: '02026433\x00v' 

然而,當我在http://wordnetweb.princeton.edu/perl/webwn使用在線WordNet的搜索工具,它執行按預期的方式查找。最新的WordNet語料庫使用nltk.download()下載。

該錯誤似乎引用WordNet定義中的一個十六進制值,當它期望找到一個整數值時。

任何想法?請告訴你是否遇到過這樣的事情。

回答

0

仔細檢查「動詞」字典文件後,我發現它實際上已損壞。下面是一個像「vi」這樣的編輯器:

02025829 38 v 01 escort 0 006 @ 02025550 v 0000 + 09992538 n 0102 ~ 02026203 v 0000 ~ 02026327 v 0000 ~ 02026433^@v 0000 ~ 02026712 v 0000 04 + 

我用空格替換了「^ @」,問題就消失了。我想更大的問題是它如何在第一個地方被損壞。

問題解決!

相關問題