import http.client, urllib.request, urllib.parse, urllib.error
def translate(IN, OUT, text):
text = urllib.parse.quote(text)
conn = http.client.HTTPConnection("translate.google.com.tr")
conn.request("GET", "/translate_a/t?client=t&text="+text+"&hl="+IN+"&tl="+OUT)
res = conn.getresponse().read().decode("cp1254",'replace')
print(res)
b1 = res.split("],[")
b2 = b1[0].strip('[]')
b3 = b2.strip('","')
b4 = b3.split('","')
return b4[0]
string = input("Turkish >>> English: ")
result = translate("tr","en",string)
print(string,">>>",result)
即時嘗試編寫一個可以將土耳其語翻譯成英語的腳本。如果我不輸入土耳其字符,該腳本運行良好。例如,這些土耳其語單詞成功翻譯=(kalemlik,deneme,bilgisayar,okyanus),但如果我輸入的單詞有非ASCII字符,則翻譯不成功。這些是土耳其字符=(「ıİğĞüÜşŞöÖçÇ」),這些是一些土耳其語詞有非ascii字符=(programcı,şarkı,çalışma,örnek,İnsan,dağ,üs)。順便說一下,cp1254是土耳其字符的有效編碼。 我能做些什麼來解決這個問題?你知道,它不僅適用於土耳其語。在python3中使用http和urllib模塊時,非ascii字符
示例;
Turkish >>> English: okyanus
[[["ocean","okyanus","",""]],[["isim",["ocean","brine","the deep","main","drink"],[["ocean",["okyanus","derya"]],["brine",["tuzlu su","salamura","deniz","okyanus"]],["the deep",["deniz","okyanus","enginler"]],["main",["ana boru","deniz","kuvvet","zor","okyanus","horoz dövüşü"]],["drink",["içmek","içki","içecek","içki içmek","deniz","okyanus"]]]],["sıfat",["oceanic"],[["oceanic",["okyanus","okyanusta bulunan","okyanus gibi"]]]]],"tr",,[["ocean",[5],1,0,999,0,1,0]],[["okyanus",4,,,""],["okyanus",5,[["ocean",999,1,0],["oceanic",0,1,0],["the ocean",0,1,0],["oceans",0,1,0]],[[0,7]],"okyanus"]],,,[["tr"]],2]
okyanus >>> ocean
這是成功的。
Turkish >>> English: dağ
[[["daÄ\u0178","daÄ\u0178","",""]],,"tr",,[["daÄ\u0178",[5],1,0,1000,0,1,0]],[["daÄ\u0178",5,[["daÄ\u0178",1000,1,0]],[[0,4]],"daÄ\u0178"]],,,[["tr"]],8]
dağ >>> daÄ\u0178
失敗!
谷歌可能不會使用cp1254發送文本。網頁的字符編碼與您的終端使用的編碼無關。 http://en.wikipedia.org/wiki/Character_encodings_in_HTML – jfs
'content =「text/html; charset = UTF-8」' down down utf8 is toocessful,too – frukoprof