XPath以下兄弟用於爬網而不返回兄弟

我正在嘗試創建一個爬網程序以從供應商網站提取一些屬性數據，以便我們可以對我們的內部屬性數據庫進行審計，並且是新的import.io。我觀看了一堆視頻，但儘管我的語法看起來是正確的，但我的手動xpath重寫並未返回屬性值。我有以下的HTML代碼示例：XPath以下兄弟用於爬網而不返回兄弟

<table> 
<tbody><tr class="oddRow"> 
<td class="label">&nbsp;Adhesive Type&lrm;</td><td>&nbsp;Epoxy&lrm; 
</td> 
</tr> 
<tr> 
<td class="label">&nbsp;Applications&lrm;</td><td>&nbsp;Hard Disk Drive Component Assembly&lrm; 
</td> 
</tr> 
<tr class="oddRow"> 
<td class="label">&nbsp;Brand&lrm;</td><td>&nbsp;Scotch-Weld&lrm; 
</td> 
</tr> 
<tr> 
<td class="label">&nbsp;Capabilities&lrm;</td><td>&nbsp;Sustainability&lrm; 
</td> 
</tr> 
<tr class="oddRow"> 
<td class="label">&nbsp;Color&lrm;</td><td>&nbsp;Clear Amber&lrm; 
</td>

我想寫下面的兄弟聲明通過import.io爬蟲抓取「顏色」的XPath。當我選擇「顏色」中的XPath代碼：

//*[@id="attributeList"]/table/tbody/tr[5]/td[1]

我試着使用：

//*[@id="attributeList"]/table/tbody/tr/td[.="Color"]/following-sibling::td

但不斂，從表中的顏色屬性值。我不確定它是否與單行和雙行類有關？當我查看html時，這似乎是合乎邏輯的;顏色是「顏色」，屬性值位於以下td括號中。

來源

2015-06-05 Elizabeth VO

所選td節點中的文本不僅包含"Color"。它是 Color&lrm;。因此，你可以選擇td節點，其文本contains字符串"Color"：

'//*[@id="attributeList"]/table/tbody/tr/td[contains(text(), "Color")]/following-sibling::td/text()'

來源

2015-06-05 19:04:04 unutbu

這工作，太感謝你了！這工作！ –

XPath以下兄弟用於爬網而不返回兄弟

回答

相關問題