2013-12-20 38 views
0

我在Python 2.7中使用mechanizeurllib將一堆數據接收到變量中。但是,儘管使用了.decode(UTF-8),某些字符仍未解碼。完整的代碼如下:無法轉換UTF-8字符 - Python

#!/usr/bin/python 

import urllib 
import mechanize 
from time import time 

total_time = 0 
count = 0 
def send_this(url): 
     global count 
     count = count + 1 
     this_browser=mechanize.Browser() 
     this_browser.set_handle_robots(False) 
     this_browser.addheaders=[('User-agent','Chrome')] 

     translated=this_browser.open(url).read().decode("UTF-8") 
     return translated 

def collect_this(my_ltarget,my_lhome,data): 
     global total_time 
     data = data.replace(" ","%20") 
     get_url="http://mymemory.translated.net/api/ajaxfetch?q="+data+"&langpair="+my_lhome+"|"+my_ltarget+"&mtonly=1" 
     return send_this(get_url) 

ctr = 0 
print collect_this("hi-IN","en-GB","This is my first proper computer program.") 

print語句的輸出是:

{"responseData":{"translatedText":"\u092f\u0939 \u092e\u0947\u0930\u093e \u092a\u0939 
u0932\u093e \u0938\u092e\u0941\u091a\u093f\u0924 \u0915\u0902\u092a\u094d\u092f\u0942\u091f 
\u0930 \u092a\u094d\u0930\u094b\u0917\u094d\u0930\u093e\u092e \u0939\u0948 
\u0964"},"responseDetails":"","responseStatus":200,"matches":[{"id":0,"segment":"This is my 
first proper computer program.","translation":"\u092f\u0939 \u092e\u0947\u0930\u093e \u092a 
\u0939\u0932\u093e \u0938\u092e\u0941\u091a\u093f\u0924 \u0915\u0902\u092a\u094d\u092f\u0942 
\u091f\u0930 \u092a\u094d\u0930\u094b\u0917\u094d\u0930\u093e\u092e \u0939\u0948 
\u0964","quality":"70","reference":"Machine Translation provided by Google, Microsoft, 
Worldlingo or MyMemory customized engine.","usage-count":0,"subject":"All","created- 
by":"MT!","last-updated-by":"MT!","create-date":"2013-12-20","last-update- 
date":"2013-12-20","match":0.85}]} 

開始\u...的字符被認爲是被認爲要被轉換的字符。

我在哪裏出了錯?

回答

5

您沒有UTF-8編碼的字符串。你有JSON和JSON Unicode轉義。用JSON解碼器對其進行解碼:

import json 
json.loads(your_json_string) 
+1

+1的Unicode並不意味着UTF-8編碼 – Alvaro

+0

這是否意味着,我需要脫掉'解碼(「UTF-8」)',並添加另一份聲明在它下面像這樣:'翻譯= json.loads(翻譯)'? – rahuL

+0

@ i.h4d35:是的。 (如果需要,你也可以在一行上完成) – user2357112