使用|
指示union
:
xpath3 = "//span[@class='productSpecialPrice']//text()|//div[@class='proDetPrice']//text()"
這不正是你問什麼,但我認爲它可以在一個可行的解決方案被納入。
從the XPath (version 1.0) specs:
的|運算符計算它的操作數的並集,它必須是 節點集。
例如,
import lxml.html as LH
urls = [
'http://jujumarts.com/mobiles-accessories-smartphones-wildfire-sdarkgrey-p-551.html',
'http://jujumarts.com/computers-accessories-transcend-500gb-portable-storejet-25d2-p-2616.html'
]
xpaths = [
"//span[@class='productSpecialPrice']//text()",
"//div[@class='proDetPrice']//text()",
"//span[@class='productSpecialPrice']//text()|//div[@class='proDetPrice']//text()"
]
for url in urls:
doc = LH.parse(url)
for xpath in xpaths:
print(doc.xpath(xpath))
print
產生
['Rs.11,800.00']
['Rs.13,299.00', 'Rs.11,800.00']
['Rs.13,299.00', 'Rs.11,800.00']
[]
['Rs.7,000.00']
['Rs.7,000.00']
另一種方式來獲得你想要的信息是
"//*[@class='productSpecialPrice' or @class='proDetPrice']//text()"
我正在處理數百個網站,併爲每個門戶處理多個xpath以便能夠使用try /除了看起來很笨拙。顯然,XPath 2.0非常有能力做到這一點。 – 2013-04-23 12:48:39