Python urllib2解析html問題

2011-08-01 63 views 1 likes

我正在使用機械化來解析網站的html，但與此網站我得到了奇怪的結果。Python urllib2解析html問題

from mechanize import Browser 
br = Browser() 
r = br.open("http://www.heavenplaza.com") 
result = r.read()

結果是我不明白的東西。你可以看到這裏：http://paste2.org/p/1556077

任何人都可以有一些方法來獲得該網站的HTML？機械化或urllib。

感謝

來源

2011-08-01 kairyu

請張貼的結果答案，而不是在引擎收錄。特別是當結果是一行時！ – senderle

回答

import urllib2, StringIO, gzip 
f = urllib2.urlopen("http://www.heavenplaza.com") 
data = StringIO.StringIO(f.read()) 
gzipper = gzip.GzipFile(fileobj=data) 
print gzipper.read()

來源

2011-08-01 13:52:58 ksn

得到它的工作，非常感謝:) – kairyu

我趕緊檢查了腳本的控制檯和網站返回的廢話。您可能需要將您的HTTP用戶代理欺騙爲網站認爲您不使用機器人的其他內容。

http://www.google.com工作

來源

2011-08-01 13:47:30

這是我的用戶代理：br.addheaders = [（'User-Agent'，'Mozilla/5.0（Windows; U; Windows NT 6.1; en-US; rv：1.9.2.17）Gecko/20110420 Firefox/3.6。 17'）]，這也不起作用。 – kairyu

根據上面的回覆，該網站沒有正確地承認/使用接受結束的gzip標頭 –

相關問題

11. HTML DOM解析問題
12. Python - 線程和urlopen（urllib2）和解析
13. Python 2.6：與urllib2並行解析
14. Python XML解析問題
15. Python的XML解析問題
16. Python的XML解析問題
17. Python + getopt - 解析問題
18. 問題解析使用Python
19. python email.message_from_string（）解析問題
20. Python XML解析問題
21. python xml.dom解析問題
22. Python html解析
23. python html解析
24. HTML解析器導入問題
25. 問題的的urllib2
26. 解析HTML時遇到問題
27. Python：解析wordpress HTML
28. Python的HTML解析
29. 解析HTML與Python
30. html解析器python