2011-08-04 54 views
0

我正在嘗試使用pyqt讀取網頁。我需要多次使用不同的URL調用一個方法。我目前使用的代碼類似於:http://blog.sitescraper.net/2010/06/scraping-javascript-webpages-in-python.html#comment-formpyqt4 seg錯誤順序應用程序開始停止

但是,當我嘗試我得到seg故障。歡迎任何建議。

import sys 

from time import clock 
from PyQt4.QtGui import * 
from PyQt4.QtCore import * 
from PyQt4.QtWebKit import * 
from PyQt4.QtNetwork import * 

class Render(QWebPage): 
    def __init__(self): 
    self.app = QApplication(sys.argv) 
    QWebPage.__init__(self) 

    self.networkAccessManager().finished.connect(self.handleEnd) 
    self.loadFinished.connect(self._loadFinished) 

    self.mainFrame().setScrollBarPolicy(Qt.Horizontal, Qt.ScrollBarAlwaysOff) 
    self.mainFrame().setScrollBarPolicy(Qt.Vertical, Qt.ScrollBarAlwaysOff) 

    def loadURL(self, url): 
    self.mainFrame().load(QUrl(url)) 
    self.app.exec_() 

    def savePageImage (self, width, height, Imagefile): 
    pageSize = self.mainFrame().contentsSize(); 
    if width == 0: 
     pageWidth = pageSize.width() 
    else: 
     pageWidth = width 
    if height == 0: 
     pageHeight = pageSize.height() 
    else: 
     pageHeight = height 

    self.setViewportSize(QSize(pageWidth, pageHeight)) 
    Img = QImage(self.viewportSize(), QImage.Format_ARGB32) 
    painter = QPainter(Img) 
    self.mainFrame().render(painter) 
    painter.end() 
    Img.save(Imagefile) 


    def _loadFinished(self, result): 
    print "load finish" 
    self.frame = self.mainFrame() 
    self.returnVal = result 
    self.app.quit() 

    def handleEnd (self, reply): 
    # get first http code and disconnect 
    # could add filter to listen relevant responses 
    self.httpcode = reply.attribute(QNetworkRequest.HttpStatusCodeAttribute) 
    self.networkAccessManager().finished.disconnect(self.handleEnd) 


jsrurl = 'http://www.w3resource.com/javascript/document-alert-confirm/four.html' 
badurl='something.or.other' 
badhttp = 'http://eclecticself.com/test2.html' 
testurl = 'http://www.nydailynews.com/entertainment/index.html' 
testurl2 = 'http://www.palmbeachpost.com/' 
testurl3 = 'http://www.nydailynews.com/news/politics/2011/08/03/2011-08-03_pat_buchanan_downplays_controversy_after_calling_president_obama_your_boy_to_rev.html' 
url = testurl 



start = clock() 
r = Render() 
r.loadURL(url) 
html = r.frame.toHtml() 
elapsed = clock() - start 
print elapsed 

if (r.returnVal == True): 
    if (r.httpcode.toInt()[0] != 404): 
     #print html.toUtf8() 
     start = clock() 
     r.savePageImage(1024, 0, "pageSnapshot.png") 
     elapsed = clock() - start 
     print elapsed 
    else: 
     print 'page not found' 
else: 
    print 'badurl' 

s = Render() 
s.loadURL(jsrurl) 
html = s.frame.toHtml() 
elapsed = clock() - start 
print elapsed 
if (s.returnVal == True): 
    if (s.httpcode.toInt()[0] != 404): 
     print html.toUtf8() 
     start = clock() 
     s.savePageImage(1024, 0, "pageSnapshot.png") 
     elapsed = clock() - start 
     print elapsed 
    else: 
     print 'page not found' 
else: 
    print 'badurl' 
+0

將打印語句放在任何地方,並找出您實際上在哪裏獲得段錯誤。我懷疑初始化QApplication。 – utdemir

+0

你試過調試嗎? – BrainStorm

+0

是的,找不到任何理由。 – user879422

回答

1

PyQt經常忘記保持對象的引用。解決方法:

  • 嘗試使用PySide,而不是PyQt的,很容易,因爲API幾乎是完全一樣的PyQt。我會首先嚐試PySide,它可能會立即解決您的問題,或者至少使其可預測和可重複。

  • 嘗試保持對所使用的所有Qt對象的引用,並在完成對象時刪除這些引用。您也可以嘗試顯式關閉它們或在轉到下一個網頁之前導航到「about:blank」。

它通常有幫助。如果沒有,那麼你需要縮小它,因爲utdemir建議在上面。調試通常不會有幫助,因爲這些問題通常也與時間有關。沒有輸出緩衝區的記錄通常可以幫助您更接近問題的根源。

我和你在一起的靈魂,這樣的問題很難追查!

+0

嘗試PySide相同的結果。基本上要退出應用程序比重新啓動應用程序 – user879422