1
最近我試圖抓取Google搜索結果中的數據,看起來pyqt是一個很好的模塊,可以在html中執行javascript並獲得最終的html結果。然而,對於其他網站,它似乎是正確的。但是,對於Google搜索,它總是失敗。我在這裏學習的榜樣: http://webscraping.com/blog/Scraping-JavaScript-webpages-with-webkit/如何通過pyqt獲取html頁面的最終結果?
的代碼是:
import sys
import time
from PyQt4.QtGui import *
from PyQt4.QtCore import *
from PyQt4.QtWebKit import *
class Render(QWebPage):
def __init__(self, url):
self.app = QApplication(sys.argv)
QWebPage.__init__(self)
self.loadFinished.connect(self._loadFinished)
self.mainFrame().load(QUrl(url))
self.app.exec_()
def _loadFinished(self, result):
self.frame = self.mainFrame()
self.app.quit()
url1 = 'http://www.google.com/search?start=0&client=firefox-a&q=adidas&safe=off&pws=0&tbs=cdr%3A1%2Ccd_min%3A1%2F1%2F2002%2Ccd_max%3A1%2F1%2F2001&filter=0&num=10&access=a&oe=UTF-8&ie=UTF-8'
url2 = 'http://www.google.com/search?start=0&client=firefox-a&q=adidas&safe=off&pws=0&tbs=cdr%3A1%2Ccd_min%3A1%2F1%2F2009%2Ccd_max%3A7%2F1%2F2009&filter=0&num=10&access=a&oe=UTF-8&ie=UTF-8'
r = Render(url1)
html = r.frame.toHtml()
print type(html)
outfile = open('page.html','w')
outfile.write(html.toUtf8())
outfile.close()
print 'finished!'
然而,URL1和URL2的結果總是得到相同的結果,當我禁用了JavaScript的結果是一樣的鉻。那麼我們該如何處理呢?我們如何獲取Google搜索的最終html?
沒有你......你只是張貼完全相同的代碼作爲OP了您的解決方案? – ChrisArmstrong