Python以及如何將編碼設置爲utf-8？

我努力在Python中編碼字符。我有腳本，它從網站的文章充滿了特殊的語言字符，我打開一個外部文件與普通文字，一個txt文件保存到utf-8，也有特殊字符的文字。在這裏我想建立一個編碼部分代碼如下所示：Python以及如何將編碼設置爲utf-8？

def getArticleText(webtext): 
articletext = "" 
soup = BeautifulSoup(webtext) 
for tag in soup.find_all("div", {"class":"dr_article"}): 
    for element in tag.find_all("p"): 
     articletext += element.contents[0] 
    return articletext 

def getArticle(url): 
htmltext = gethtml.getHtmlText(url) 
return getArticleText(htmltext) 

def getKeywords(articletext): 
common = open("word_rank/comon.txt").read().split('\n') 
word_dict = {} 
word_list = articletext.lower().split() 
for word in word_list: 
    if word not in common : 
     if word not in word_dict: 
      word_dict[word] = 1 
     if word in word_dict: 
      word_dict[word] += 1 
print sorted(word_dict.items(),key=lambda(k,v):(v,k),reverse=True)

現在我沒有問題，那整個articletext打印。它以正確的方式打印出這些特殊字符。

我的問題是，在getKeywords定義定義的關鍵字，它們會以這種方式打印出來作爲例子：

(u'\u0161elteru', 2), (u'\u010ditateljice', 2), 
(u'\u017eeli,', 2), (u'\u0161tekat', 2),

等等...

我如何設置的編碼該關鍵字，以便它會以適當的方式顯示單詞？

來源

2013-08-01 dzordz

大概BeautifulSoup編碼utf-8字符，找到解碼方法。 – solusipse

使用unidecode

用法示例：

t = u"\u5317\u4EB0" 
unidecode('%s' % (t,))

來源

2013-08-01 12:44:47 LarsVegas

我.read()之後設置好的起來common = open("word_rank/comon.txt").read().split('\n').decode('utf-8')和它的工作。正是我需要的：D。無論如何，謝謝你們！

來源

2013-08-01 12:50:18 dzordz

Python以及如何將編碼設置爲utf-8？

回答

相關問題