2013-09-24 54 views
3

我試圖通過Python 3.3中的Windows(使用Git Bash shell)運行TextBlob教程。textblob中的UnicodeDecodeError教程

我已經安裝了textblobnltk以及任何依賴關係。

的Python代碼是:

from text.blob import TextBlob 

wiki = TextBlob("Python is a high-level, general-purpose programming language.") 
tags = wiki.tags 

我收到以下錯誤

Traceback (most recent call last): 
File "textblob.py", line 4, in <module> 
    tags = wiki.tags 
File "c:\Python33\lib\site-packages\text\decorators.py", line 18, in __get__ 
    value = obj.__dict__[self.func.__name__] = self.func(obj) 
File "c:\Python33\lib\site-packages\text\blob.py", line 357, in pos_tags 
    for word, t in self.pos_tagger.tag(self.raw) 
File "c:\Python33\lib\site-packages\text\taggers.py", line 40, in tag 
    return pattern_tag(sentence, tokenize) 
File "c:\Python33\lib\site-packages\text\en.py", line 115, in tag 
    for sentence in parse(s, tokenize, True, False, False, False, encoding).split(): 
File "c:\Python33\lib\site-packages\text\en.py", line 99, in parse 
    return parser.parse(unicode(s), *args, **kwargs) 
File "c:\Python33\lib\site-packages\text\text.py", line 1213, in parse 
    s[i] = self.find_tags(s[i], **kwargs) 
File "c:\Python33\lib\site-packages\text\en.py", line 49, in find_tags 
    return _Parser.find_tags(self, tokens, **kwargs) 
File "c:\Python33\lib\site-packages\text\text.py", line 1161, in find_tags 
    map = kwargs.get( "map", None)) 
File "c:\Python33\lib\site-packages\text\text.py", line 967, in find_tags 
    tagged.append([token, lexicon.get(token, i==0 and lexicon.get(token.lower()) or None)]) 
File "c:\Python33\lib\site-packages\text\text.py", line 98, in get 
    return self._lazy("get", *args) 
File "c:\Python33\lib\site-packages\text\text.py", line 79, in _lazy 
    self.load() 
File "c:\Python33\lib\site-packages\text\text.py", line 367, in load 
    dict.update(self, (x.split(" ")[:2] for x in _read(self._path) if x.strip())) 
File "c:\Python33\lib\site-packages\text\text.py", line 367, in <genexpr> 
    dict.update(self, (x.split(" ")[:2] for x in _read(self._path) if x.strip())) 
File "c:\Python33\lib\site-packages\text\text.py", line 346, in _read 
    for line in f: 
File "c:\Python33\lib\encodings\cp1252.py", line 23, in decode 
    return codecs.charmap_decode(input,self.errors,decoding_table)[0] 
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 16: character maps to <undefined> 

任何想法,這裏有什麼問題?在字符串沒有幫助之前添加'u'

+0

我很快就通過了該教程,它在我的OS X機器上使用Python 3.3工作正常。你可能有一個老版本的TextBlob?它看起來像一個類似的問題只是修復和發佈:https://github.com/sloria/TextBlob/issues/15 –

+0

沒有運氣。我使用0.6.3,我相信是最新的。我做了一個pip --force-reinstall,安裝pyyaml時發現了libyaml錯誤。雖然安裝確實繼續,但我不確定這是一個嚴重的問題。 – sgoldber

+0

爲了繼續解決這個問題,我在[nltk網站](http://nltk.org/)的首頁上通過了一個簡短的教程,並且遇到了一個非常類似的錯誤。克隆從github上的主回購解決了這個問題。也許我需要嘗試與textblob類似的東西。 – sgoldber

回答

3

版本0.7.1修復了這個問題,這意味着它的時候了

$ pip install -U textblob 

的問題是,用於部分詞性標註的en-lexicon.txt文件打開使用的是Windows的默認平臺編碼的文件, CP1252。該文件顯然具有Python無法從此編碼解碼的字符。這是通過以utf-8模式顯式打開文件來解決的。