如何獲得以下兄弟::文本（）和以下兄弟:: b？

我試圖解析一個網站來提取人名和國家。如何獲得以下兄弟::文本（）和以下兄弟:: b？

頁面有時看起來像：

<th>Inventors:</th> 
    <td align="left" width="90%"> 
      <b>Harvey; John Christopher</b> (New York, NY)<b>, Cuddihy; James William</b> (New York, NY) 
    </td>

我能得到使用國家

//th[contains(text(), "Inventors:")]/following-sibling::td/b[contains(text(),";")]/following-sibling::text() 

[(New York, NY), (New York, NY)]

有時頁面看起來像（添加圍繞國名）：

<th>Inventors:</th> 
    <td align="left" width="90%"> 
     <b>Harvey; John Christopher</b> (New York, <b>NY</b>)<b>, Cuddihy; James William</b> (New York, <b>NY</b>) 
    </td>

我可以得到國家：

//th[contains(text(), "Inventors:")]/following-sibling::td/b[contains(text(),";")]/following-sibling::b 

[NY, NY]

現在，我希望能夠在兩種情況下獲得國家。

我試着用：

//th[contains(text(), "Inventors:")]/following-sibling::td/b[contains(text(),";")]/following-sibling::*[self::text() or self::b]

但當時我只得到「B」 S ...

我也試過：

//.../following-sibling::text() | //.../following-sibling::b

但我也只得到「b」...

任何想法爲什麼這不按預期方式工作？任何解決方案來獲得這兩個條目

來源

2016-03-25 user2115112

您可以使用

string(//th[.="Inventors:")]/following-sibling::td)

所以，你會選擇

Harvey; John Christopher (New York, NY), Cuddihy; James William (New York, NY)

在這兩種情況下

。然後使用XPath 2.0字符串/正則表達式處理函數，或者如果只有XPath 1.0可用，則使用調用語言中的這些工具。

來源

2016-03-25 15:25:31 kjhughes

你也可以嘗試類似：

//th[contains(text(), "Inventors:")] 
    /following-sibling::td/b[contains(text(),";")] 
    /following-sibling::node()[not(self::b[contains(text(),";")])]

這將選擇以下所有同胞節點，但是忽略包含A B節點「;」。

來源

2016-03-25 15:59:56

如何獲得以下兄弟::文本（）和以下兄弟:: b？

回答

相關問題