我不同意到目前爲止的任何答案。 XPath表達式,它正是你所要求的是
//tr[@class = 'heading' and normalize-space(td) = 'Heading 1']/following::td[following::tr[@class = 'heading' and normalize-space(td) = 'Heading 2']]
它轉換爲
//tr select all `tr` elements anywhere in the document
[@class = 'heading' but only if they have a `class` attribute whose
value is equal to "heading"
and normalize-space(td) = 'Heading 1'] and only if they contain a `td` element which has
a string value of "Heading 1".
/following::td select all `td` elements that follow them
[following::tr but only if they are followed by a `tr` element
[@class = 'heading' which again has a `class` attribute with "heading"
as its value
and normalize-space(td) = 'Heading 2']] and only if this `tr` element has a `td` child
element with "Heading 2" as its string value
並且將返回以下(由------
分開的單獨的結果):
<td>L 1</td>
-----------------------
<td>R 1</td>
-----------------------
<td>L 2</td>
-----------------------
<td>R 2</td>
的normalize-space()
函數可以去除尾隨空白字符串。
編輯:如果您打算只選擇幾個td
元素第一:
//tr[@class = 'heading' and normalize-space(td) = 'Heading 1']/following::tr/td[position() = 1 and following::tr[@class = 'heading' and normalize-space(td) = 'Heading 2']]
,其結果將是
<td>L 1</td>
-----------------------
<td>L 2</td>
爲了更加完整,以解釋如下情況:
<body>
<tr class="heading">
<td colspan="2"> Heading 1 </td>
</tr>
<tr>
<td>L 1</td>
<td>R 1</td>
<td>third</td>
</tr>
<tr>
<td>L 2</td>
<td>R 2</td>
</tr>
<tr class="heading">
<td colspan="2"> Heading other</td>
</tr>
<tr>
<td>L 3</td>
<td>R 3</td>
</tr>
<tr class="heading">
<td colspan="2"> Heading 2</td>
</tr>
</body>
凡是有 「標題1」 和 「標題2」,其子td
元素不應該出現在結果之間不相關的標題,使用
//tr[@class = 'heading' and normalize-space(td) = 'Heading 1']/following::tr[not(@class)]/td[position() = 1 and following::tr[@class = 'heading' and normalize-space(td) = 'Heading 2']]
編輯:
此刻,您的xpath會查找2個標題之間的元素,但是對於頁面上的最後一個組該頁面,不會有第二個標題來引用。
到現在爲止,您沒有解釋實際數據中是這種情況。使用
//tr[@class = 'heading' and normalize-space(td) = 'Heading 1']/following::tr[not(@class)]/td[position() = 1 and not(preceding::tr[@class = 'heading' and normalize-space(td) = 'Heading 2'])]
編輯2:
我做的,但我還加了注「理想我需要能夠只做到這一點‘標題1’作爲輸入 - 我需要我提供的標題下的所有元素,但在新標題下忽略任何內容。「
//tr[@class = 'heading' and normalize-space(td) = 'Heading 1']/following::tr[not(@class)]/td[position() = 1 and not(preceding::tr[@class = 'heading' and normalize-space(td) != 'Heading 1'])]
不知道你甚至讀了一個問題,我不知道你是什麼回答! –