序言：

遺憾的是，它不能完全工作，因此我無法從lxml樹中提取我希望的數據。我對這個具體案件並不特別感興趣;我正在尋找更一般的答案。

import sys 
from PyQt4.QtGui import * 
from PyQt4.QtCore import * 
from PyQt4.QtWebKit import * 
from lxml import html 

class Render(QWebPage): 
    def __init__(self, url): 
    self.app = QApplication(sys.argv) 
    QWebPage.__init__(self) 
    self.loadFinished.connect(self._loadFinished) 
    self.mainFrame().load(QUrl(url)) 
    self.app.exec_() 

    def _loadFinished(self, result): 
    self.frame = self.mainFrame() 
    self.app.quit() 

url = 'http://pycoders.com/archive/' 
#This does the magic.Loads everything 
r = Render(url) 
#result is a QString. 
result = r.frame.toHtml() 
#QString should be converted to string before processed by lxml 
formatted_result = str(result.toAscii()) 

#Next build lxml tree from formatted_result 
tree = html.fromstring(formatted_result)

該指南繼續這樣做：

archive_links = tree.xpath('//divass="campaign"]/a/@href')

這將導致一個錯誤：

Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
    File "src\lxml\lxml.etree.pyx", line 1587, in lxml.etree._Element.xpath (src\lxml\lxml.etree.c:59353) 
    File "src\lxml\xpath.pxi", line 307, in lxml.etree.XPathElementEvaluator.__call__ (src\lxml\lxml.etree.c:171227) 
    File "src\lxml\xpath.pxi", line 227, in lxml.etree._XPathEvaluatorBase._handle_result (src\lxml\lxml.etree.c:170184) 
lxml.etree.XPathEvalError: Invalid expression

問題

要訪問我的數據，我仍然需要使用正確的XPath的。爲了測試起見，我試過使用title = tree.xpath('//title'). 這讓我留下了一個<element title at 0xdf418>對象。我無法從這個對象中提取數據，即這種情況下的標題。

我已經嘗試了幾件事，但沒有實際返回數據。

>>> title .__len__() 
1 
>>> title .__sizeof__() 
72 
>>> type(title) 
<type 'list'> 
>>>title[0] 
<element title at 0xdfc418>

來源

2017-06-02 Mitchell van Zuylen

可能是，有一個錯字。試試這個：

archive_links = tree.xpath('//div[class="campaign"]/a/@href')

或者：

archive_links = tree.xpath('//div[@class="campaign"]/a/@href')

來源

2017-06-02 11:42:46 Fomalhaut

這句法更有意義，但遺憾的是，我將返回'archive_links = []'。 –

@MitchellvanZuylen，這是因爲你只需要初始頁面源代碼就可以獲得鏈接，你需要等到JavaScript執行完成 – Andersson

根據指南，Render類等待JS執行。我誤解了指南，是指導錯誤還是錯過了「渲染」類？ –

從LXML樹中提取數據

序言：

問題

回答

從LXML樹中提取數據

序言：

問題

回答

相關問題