2014-02-26 31 views
0

我有一些URL格式的數據,我想用Python解碼。我嘗試了(接受的)答案here,但我仍然沒有得到正確的解碼。我的代碼如下:URL UTF-8解碼Python

import urllib2 

name = '%D0%BD%D0%BE%D1%82%D0%B8%D1%84%D0%B8%D0%BA%D0%B0%D1%82%D0%BE%D1%80-%D0%BE%D0%BB%D0%B8%D0%BC%D0%BF%D0%B8%D0%B9%D1%81%D0%BA%D0%B8%D1%85-%D0%B8' 

print urllib2.unquote(urllib2.quote(name.encode("utf8"))).decode("utf8") 

這應該打印нотификатор-олимпийских-и但它打印%D0%BD%D0%BE%D1%82%D0%B8%D1%84%D0%B8%D0%BA%D0%B0%D1%82%D0%BE%D1%80-%D0%BE%D0%BB%D0%B8%D0%BC%D0%BF%D0%B8%D0%B9%D1%81%D0%BA%D0%B8%D1%85-%D0%B8

所以我試圖再次unquoting它

打印urllib2.unquote(urllib2.unquote(urllib2.quote(名稱。編碼( 「UTF-8」)))。解碼( 「UTF-8」))

,但它給了我ноÑиÑикаÑоÑ-олимпийÑкиÑ-и

我不知道爲什麼,這發生。任何人都可以請解釋我在哪裏做錯了,我該如何糾正我的錯誤?

回答

1

太多的報價/不引號操作:您會得到一個已經爲的UTF-8字符串爲什麼您使用UTF-8和URL編碼它?

unquoted = urllib.unquote(name) 
print unquoted.decode('utf-8') 
# нотификатор-олимпийских-и