2014-03-27 107 views
1

我從另一個網站上刪除工作。源網站與用戶複製粘貼數據和結構更改的情況不同。Xpath聯合多個查詢

情況1:

<h3>Job Description</h3> 
<div style="text-align: justify; line-height: 115%"><b> 
Receptionist is assigned for ANAFAE-ALC based in Mazar-e-Sharif. This position is supervised by and reports to ALC Educational Program Manager and following are the main duties but are not limited to that.</div> 

情況2:

<h3>Job Description</h3> 
<p> 
Receptionist is assigned for ANAFAE-ALC based in Mazar-e-Sharif. This position is supervised by and reports to ALC Educational Program Manager and following are the main duties but are not limited to that.</p> 

在這種情況下,p標籤有時替換其他HTML標籤。

案例3:

<h3>Job Description</h3> 
Receptionist is assigned for ANAFAE-ALC based in Mazar-e-Sharif. This position is supervised by and reports to ALC Educational Program Manager and following are the main duties but are not limited to that. 

我使用這個字符串來獲取內容。這現在適用於案例3,但不適用於其他兩種情況。我如何解決所有三種情況。

//text()[preceding::h3[text()="Job Description"] 

回答

0

你的XPath表達式中選擇由一個<h3>用文本節點等於「工作描述」開頭的文本節點。這隻與第三種情況相符,因爲前兩種情況分別在<h3>之後有<div><p>

你可以嘗試這樣的事:

一些細節:

//node()選擇從最初的上下文中的所有元素或文本節點後代。

preceding-sibling::*[1]選擇緊接的前一個元素。

[self::h3 = "Job Description"]檢查該元素是<h3>,並且其字符串值等於「作業描述」。

/string()返回上下文節點的字符串值。對於您的示例內容,可以使用/descendant-or-self::text()。它通過選擇上下文節點(如果它是文本節點)以及所有後代文本節點(如果它是元素)來工作。但是,如果將<div><p>更改爲混合內容(即散佈有文本節點的子元素),則該表達式將返回一系列後代文本節點,而/string()將將它們連接在一起。

+0

它不會返回任何東西。這是我現在擁有的完整字符串。//節點()[之前的兄弟姐妹:: * [1] [self :: h3 =「職位描述」]] /後代或自我::文本()和以下兄弟:: h3 [文本() =「工作要求」]]' – Taranum

+0

XPath表達式的執行環境是什麼?你如何運行? – joemfb

+0

這有效:'// node()[before :: h3 [node()=「Contact Information」]' – Taranum