python編碼

使用mechanize，我檢索到一個網頁的源頁面，其中包含一些非ASCII字符，如中文字符。python編碼

代碼低於：

#using python2.6 
from mechanize import Browser 

br = Browser() 
br.open("http://www.example.html") 

src = br.reponse().read() #retrieve the source of the web 

print src #print the src

問：

1。根據該頁面的源代碼，我可以看到，它的charset=gb2312，但是當我print src，所有的內容是正確的，我的意思是沒有胡言亂語。爲什麼？ print知道src的編碼嗎？

2.我應該明確解碼還是編碼src？

來源

2011-09-26 Alcott

打印根據控制檯的編碼方案爲您編碼。如果你想輸出結果到文件，你需要對它進行編碼 – xiaohan2012

src是unicode，它沒有編碼。 print（或更準確地說，sys.stdout.write()）指出輸出時使用什麼編碼。

來源

2011-09-26 07:11:45

沒有編碼？但unicode（utf-8？）不是一種編碼？ – Alcott

[Unicode不是UTF-8。]（http://www.joelonsoftware.com/articles/Unicode.html） –

回答

相關問題