stemmer和lemmatizer似乎爲傳遞給我的文本文件的某些句子產生了這個錯誤。他們是什麼意思,我該如何解決它們?我該如何解決這個UnicodeDecodeError?
Traceback (most recent call last):
File "preproc.py", line 89, in <module>
apos=stem_data(nostop)
File "preproc.py", line 51, in stem_data
r=stemmer.stem(n)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nltk/stem/porter.py", line 632, in stem
stem = self.stem_word(word.lower(), 0, len(word) - 1)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nltk/stem/porter.py", line 590, in stem_word
word = self._step1ab(word)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nltk/stem/porter.py", line 275, in _step1ab
if word.endswith("sses"):
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 6: ordinal not in range(128)
哪些是正在生成錯誤的句子? –
[python nltk.sent \ _tokenize錯誤ascii編解碼器無法解碼]的可能重複(http://stackoverflow.com/questions/27212912/python-nltk-sent-tokenize-error-ascii-codec-cant-decode) –