xpath選擇節點文本和子節點

我正在使用python scrapy從網站上刮取一些數據。xpath選擇節點文本和子節點

的網站內容是這樣的

<html> 
    <div class="details"> 
    <div class="a"> not needed</div> 
    content 1 
    <p>content 2</p> 
    <div>content 2</div> 
    <p>content 2</p> 
    <div>content 2</div> 
    <p>content 2</p> 
    <div class="b"> this is also not needed</div> 
    </div> 
</html>

我需要得到完整的HTML數據排除與階級一個div，B。

所以我的輸出會是這樣

<div class="details"> 
content 1 
<p>content 2</p> 
<div>content 2</div> 
<p>content 2</p> 
<div>content 2</div> 
<p>content 2</p> 
</div>

我怎麼能寫正確的XPath爲或者我應該寫的XPath使用類「細節」，「A」，「B」和使用字符串操作DIV刪除類'a'，'b'的div？

注意的是，這裏的內容是文本，而不是用DIV的一個子類「細節」

來源

2014-11-24 sajith

你可以得到除div與a類或b所有兒童使用node()和self::語法：

//div[@class="details"]/node()[not(self::div[@class="a" or @class="b"])]

使用scrapy shell演示：

$ scrapy shell index.html 
>>> nodes = response.xpath('//div[@class="details"]/node()[not(self::div[@class="a" or @class="b"])]').extract() 
>>> print ''.join(nodes) 
    content 1 
    <p>content 2</p> 
    <div>content 2</div> 
    <p>content 2</p> 
    <div>content 2</div> 
    <p>content 2</p>

來源

2014-11-24 05:09:01 alecxe

xpath選擇節點文本和子節點

回答

相關問題