Xpath的 - 如何選擇相關的表妹數據

-1

<html> 
    <table border="1"> 
     <tbody> 
      <tr> 
       <td> 
        <table border="1"> 
         <tbody> 
          <tr> 
           <th>aaa</th> 
           <th>bbb</th> 
           <th>ccc</th> 
           <th>ddd</th> 
           <th>eee</th> 
           <th>fff</th> 
          </tr> 
          <tr> 
           <td>111</td> 
           <td>222</td> 
           <td>333</td> 
           <td>444</td> 
           <td>555</td> 
           <td>666</td> 
          </tr> 
         </tbody> 
        </table> 
       </td> 
      </tr> 
     </tbody> 
    </table> 
</html>

如何選擇使用XPath特定相關的表妹數據，所需的輸出將是如下：Xpath的 - 如何選擇相關的表妹數據

<th>aaa</th> 
<th>ccc</th> 
<th>fff</th> 
<td>111</td> 
<td>333</th> 
<td>666</td>

中的XPath的最重要的方面是，我希望能夠包含或排除某些<th>標籤及其相應的<td>標籤

因此，基於答案到目前爲止，我最接近的是：

//th[not(contains(text(), "ddd"))] | //tr[2]/td[not(position()=4)]

有沒有明確使用position()=4而是引用相應的th標籤

來源

2017-06-05 Darth

這很好，你已經包含了XML和預期輸出，但是你沒有說明預期輸出符合什麼標準 - 這並不明顯。 – kjhughes

標準是選擇每個'th'和相應的'td'，但排除包含「bbb」，「ddd」，「eee」和它們相應的'td'標籤的'th' – Darth

您使用哪種編程語言'selenium '？ – Andersson

我不知道這是最好的解決方案的任何方式，但你可以嘗試

//th[not(.="bbb") and not(.="ddd") and not(.="eee")] | //tr[2]/td[not(position()=index-of(//th, "bbb")) and not(position()=index-of(//th, "ddd")) and not(position()=index-of(//th, "eee"))]

或較短的版本

//th[not(.=("bbb", "ddd", "eee"))]| //tr[2]/td[not(position()=(index-of(//th, "bbb"), index-of(//th, "ddd"),index-of(//th, "eee")))]

that returns

<th>aaa</th> 
<th>ccc</th> 
<th>fff</th> 
<td>111</td> 
<td>333</td> 
<td>666</td>

您可以避免使用複雜的XPath表達式來獲取所需的輸出。嘗試使用Python + Selenium功能來代替：

# Get list of th elements 
th_elements = driver.find_elements_by_xpath('//th') 
# Get list of td elements 
td_elements = driver.find_elements_by_xpath('//tr[2]/td') 
# Get indexes of required th elements - [0, 2, 5] 
ok_index = [th_elements.index(i) for i in th_elements if i.text not in ('bbb', 'ddd', 'eee')] 
for i in ok_index: 
    print(th_elements[i].text) 
for i in ok_index: 
    print(td_elements[i].text)

輸出是

'aaa' 
'ccc' 
'fff' 
'111' 
'333' 
'666'

如果您需要XPath 1.0解決方案：

//th[not(.=("bbb", "ddd", "eee"))]| //tr[2]/td[not(position()=(count(//th[.="bbb"]/preceding-sibling::th)+1, count(//th[.="ddd"]/preceding-sibling::th)+1, count(//th[.="eee"]/preceding-sibling::th)+1))]

來源

2017-06-05 15:48:14 Andersson

謝謝，但兩者xpaths在Firepath中顯示爲無效？ – Darth

我欣賞你的建議，但XPath似乎是最有效的方式，我發佈的HTML代碼只是一個更大的文件的片段。我使用深度嵌套的多個html文件。也就是說，你以前的回答非常接近於我在尋找什麼，它只是'index-of'函數在xpath 1.0中不起作用，你知道一個解決方法嗎？ – Darth

檢查更新回答 – Andersson

使用XPath 3.0，你可以結構成

let $th := //table/tbody/tr[1]/th, 
$filteredTh := $th[not(. = ("bbb", "ddd", "eee"))], 
$pos := $filteredTh!index-of($th, .) 
return ($filteredTh, //table/tbody/tr[position() gt 1]/td[position() = $pos])

來源

2017-06-05 16:11:32

我使用硒所以xpath 1.0將是理想的解決方案，謝謝 – Darth

Xpath的 - 如何選擇相關的表妹數據

回答

相關問題