2014-06-14 58 views
1

我有這樣的代碼:當我運行代碼UnicodeDecodeError錯誤:「ASCII」編解碼器不能在位置0解碼字節0xe2:在範圍序數不(128)

# -*- coding: utf-8 -*- 

forbiddenWords=['for', 'and', 'nor', 'but', 'or', 'yet', 'so', 'not', 'a', 'the', 'an', 'of', 'in', 'to', 'for', 'with', 'on', 'at', 'from', 'by', 'about', 'as'] 


def IntoSentences(paragraph): 
    paragraph = paragraph.replace("–", "-") 
    import nltk.data 
    sent_detector = nltk.data.load('tokenizers/punkt/english.pickle') 
    sentenceList = sent_detector.tokenize(paragraph.strip()) 
    return sentenceList 

from Tkinter import * 

root = Tk() 

var = StringVar() 
label = Label(root, textvariable=var) 
var.set("Fill in the caps: ") 
label.pack() 

text = Text(root) 
text.pack() 

button=Button(root, text ="Create text with caps.", command =lambda: IntoSentences(text.get(1.0,END))) 
button.pack() 

root.mainloop() 

一切正常。然後我插入文本並按下按鈕。但後來我得到這個錯誤:

C:\Users\Indrek>caps_main.py 
Exception in Tkinter callback 
Traceback (most recent call last): 
    File "C:\Python27\lib\lib-tk\Tkinter.py", line 1470, in __call__ 
    return self.func(*args) 
    File "C:\Python27\Myprojects\caps_main.py", line 25, in <lambda> 
    button=Button(root, text ="Create text with caps.", command =lambda: IntoSen 
tences(text.get(1.0,END))) 
    File "C:\Python27\Myprojects\caps_main.py", line 7, in IntoSentences 
    paragraph = paragraph.replace("ŌĆō", "-") 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal 
not in range(128) 

如何解決這個問題? 起初我有同樣的錯誤信息,當我嘗試運行代碼時,我添加了lambda:現在當我點擊我的應用程序中的按鈕時出現問題。

+0

請注意:'# - * - coding:utf-8 - * - 'comment只告訴解釋器如何解析源代碼。它沒有提及任何有關這些字符串的運行時操作... – Bakuriu

回答

3

您必須將字符串解碼爲utf-8(或其他編碼),然後將unicode字符串替換爲其他字符。這段代碼是做你正在努力實現的:

paragraph = paragrah.decode('utf-8').replace(u'\u014c\u0106\u014d','-') 
# '\u014c\u0106\u014d' is the unicode representation of characters ŌĆō 
相關問題