Python中的一些問題解碼用繩子

我試圖從谷歌的HTML代碼字符串寫入文件在Python 3.4Python中的一些問題解碼用繩子

#coding=utf-8 
try: 
    from urllib.request import Request, urlopen # Python 3 
except: 
    from urllib2 import Request, urlopen # Python 2 

useragent = 'Mozilla/5.0 (Windows NT 6.3; rv:36.0) Gecko/20100101 Firefox/36.0' 

#Generate URL 
url = 'https://www.google.com.tw/search?q=' 
query = str(input('Google It! :')) 
full_url = url+query 


#Request Data 
data = Request(full_url) 
data.add_header('User-Agent', useragent) 
dataRequested = urlopen(data).read() 
dataRequested = str(dataRequested.decode('utf-8')) 


print(dataRequested) 

#Write Data Into File 
file = open('Google - '+query+'.html', 'w') 
file.write(dataRequested)

它可以正確打印字符串，但是當它寫入文件，它會顯示

file.write(dataRequested) 
UnicodeEncodeError: 'cp950' codec can't encode character '\u200e' in position 97658: illegal multibyte sequence

我試圖改變解碼方式，但它不起作用。我試圖替換\ u200e，但它會帶來更多的編碼字符錯誤。

來源

2015-06-27 BobbyHo

你的問題是

dataRequested = STR（dataRequested.decode（ 'UTF-8'））

有什麼理由解碼UTF-8轉換成字符串？但那不是全部。當你從Internet獲得一個字符串時，它應該被解碼，但是當你保存字符串時，它應該被編碼。有些人不明白。他們要麼解碼或編碼。它不以這種方式工作。

我改變了你的代碼了一下。它在Python2.7和Python3.4上對我都很好。

dataRequested = dataRequested.decode('utf-8') 


print(dataRequested) 

#Write Data Into File 
file = open('Google - '+query+'.html', 'wb') 
file.write(dataRequested.encode('utf-8'))

來源

2015-06-27 14:50:55

加油！我只是拼寫了幾個字。 –

啊哈，我現在明白了。謝謝。對於我的生活，我看不出這就是你的意思！「get string」和「do not get」的並置是令人困惑的。抱歉。會讀得更好*雖然有些人不明白。感謝您的修復。 –

非常感謝。現在它工作得很好。 – BobbyHo

Python中的一些問題解碼用繩子

回答

相關問題