2013-04-20 93 views
0

我在嘗試提取XML中兩個div標記之間的文本時遇到問題。XPATH排除多個元素/標記

想象我有下面的XML:

<div class="default_style_wrap" > 

<!-- Body starts --> 
    <!-- Irrelvent Data --> 
    <div style="clear:both" /> 
    <!-- Irrelvent Data --> 
    <div class="name_address" >...</div> 
    <!-- Irrelvent Data --> 
    <div style="clear:both" /> 
    <!-- Irrelvent Data --> 
    <span class="img_comments_right" >...</span> 

    <!-- Text that I want to get --> 
Two members of the Expedition 35 crew wrapped up a 6-hour, 38 minute spacewalk at 4:41 p.m. EDT Friday to deploy and retrieve several science experiments on the exterior of the International Space Station and install a new navigational aid. 
    <br /> 
    <br /> 
The spacewalkers' first task was to install the Obstanovka experiment on the station's Zvezda service module. Obstanovka will study plasma waves and the effect of space weather on Earth's ionosphere. 

    <!-- Irrelvent Data Again --> 
    <span class="img_comments_right" >...</span> 
    <!-- Text that I want to get --> 
After deploying a pair of sensor booms for Obstanovka, Vinogradov and Romanenko retrieved the Biorisk experiment from the exterior of Pirs. The Biorisk experiment studied the effect of microbes on spacecraft structures. 
    <br /> 
    <br /> 
This was the 167th spacewalk in support of space station assembly and maintenance, totaling 1,055 hours, 39 minutes. Vinogradov's seven spacewalks total 38 hours, 25 minutes. Romanenko completed his first spacewalk. 
    <!-- Body ends --> 
</div> 

由於可能無法在反射代碼,default_style_wrap是所有其它那些不相關的divsspans的父。對我來說,相關的文字基本上都是無標籤的文字,但是因爲您還可以看到其他標籤,例如img_comments_right,所以它使我變得瘋狂。

我嘗試以下正如我在另一篇文章中看到:

"//div[@class='article_container']/*[not(self::div)]"; 

,但似乎根本無法返回任何文字,即使它沒有,我不知道怎麼也排除spans

任何想法?

回答

0

您應該嘗試以下查詢。它選擇的<span>節點的所有下面的兄弟姐妹,這是文本節點

query = '//span[@class="img_comments_right"]/following-sibling::text()'; 
+0

感謝您的回答,但我試圖讓一切*,但*的跨度和主容器內的div,我需要標籤外的文字。 – 2013-04-20 12:52:20

+0

這就是查詢返回的內容。標籤外的文字 – hek2mgl 2013-04-20 23:38:52

0

您可以使用此XPath:

//div[@class='default_style_wrap']/text() 
0

你應該能夠抓住的文本,用這個XPath:

div[@class = 'default_style_wrap']/text()[normalize-space()] 

它選擇是* default_style_wrap的孩子們都text()節點* <div>,過濾空白(或僅空白)節點。

如果你使用一個單獨的模板,你可以把每個塊整齊地在自己的段落,例如:

<xsl:template match="/"> 
    <xsl:apply-templates select="div[@class = 'default_style_wrap']/text()[normalize-space()]" /> 
</xsl:template> 

<xsl:template match="text()"> 
    <p><xsl:value-of select="." /></p> 
</xsl:template>