2016-02-05 185 views
2

我解析一個html頁面,並有一個長的CSS選擇器(我找不到一個較短的,因爲該頁面是愚蠢的)。它應該選擇表中的所有tr,但只選擇第二行......我錯過了什麼?CSS選擇器只選擇第一行

的選擇:

body > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(2) > td:nth-child(1) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(3) > td:nth-child(1) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(8) > td:nth-child(1) > table:nth-child(4) > tbody:nth-child(1) > tr:nth-child(2) > td:nth-child(1) > table:nth-child(1) > tbody:nth-child(1) tr:not(:first-child) 

頁有對方內線多個表,但前90%甚至沒有事,選擇我要使用該表後,我跟隨了一個「[space]tr:not(...) 「,所以它應該選擇所有的降行,不是嗎?

HTML網頁示例(不能鏈接它,您需要登錄訪問):選擇成功選擇我想要的表(在選擇...> tbody:nth-child(1) tr:not(:first-child)http://pastebin.com/gprXTvzz

後,年齡看起來是這樣的:

<tbody> 
    <tr valign="bottom"> 
     <td class="blackmedium" width="80"><b>Part Number</b></td> 
     <td class="blackmedium" width="100"><b>Manufacturer</b></td> 
     <td class="blackmedium" width="40"><b>Abbr.</b></td> 
     <td class="blackmedium" width="50"><b>WIX Part Number</b></td> 
     <td class="blackmedium" width="50"><b>Lead Time</b></td> 
    </tr> 
    <tr> 
     <td class="blackmedium" width="80">A0002701098</td> 
     <td class="blackmedium" width="100">MERCEDES-BENZ</td> 
     <td class="blackmedium" width="40">MBZ</td> 
     <td class="blackmedium" width="50"> <a href="http://www.wixindustrialfilters.com/cross.aspx?Part=W03AT780" target="_blank">W03AT780</a> 
     </td> 
     <td class="blackmedium" width="50"> 
     STOCK 
     </td> 
    </tr> 
    <tr bgcolor="#e0e0e0"> 
     <td class="blackmedium" width="80">A0002701598 Discontinued</td> 
     <td class="blackmedium" width="100">MERCEDES-BENZ</td> 
     <td class="blackmedium" width="40">MBZ</td> 
     <td class="blackmedium" width="50"> <a href="javascript:var w=window.open('PartDetail.asp?Part=58892','PartDetail','left=200,top=200,width=530,height=500,toolbar=no,location=no,directories=no,status=no,menubar=no,resizable=yes,scrollbars=yes');w.focus();">58892</a> 
     </td> 
     <td class="blackmedium" width="50"> 
     </td> 
    </tr> 
    <tr> 
     <td class="blackmedium" width="80">A0002772395</td> 
     <td class="blackmedium" width="100">MERCEDES-BENZ</td> 
     <td class="blackmedium" width="40">MBZ</td> 
     <td class="blackmedium" width="50"> <a href="javascript:var w=window.open('PartDetail.asp?Part=51249','PartDetail','left=200,top=200,width=530,height=500,toolbar=no,location=no,directories=no,status=no,menubar=no,resizable=yes,scrollbars=yes');w.focus();">51249</a> 
     </td> 
     <td class="blackmedium" width="50"> 
     </td> 
    </tr> 
    <tr bgcolor="#e0e0e0"> 
     <td class="blackmedium" width="80">A0002772895</td> 
     <td class="blackmedium" width="100">MERCEDES-BENZ</td> 
     <td class="blackmedium" width="40">MBZ</td> 
     <td class="blackmedium" width="50"> <a href="javascript:var w=window.open('PartDetail.asp?Part=57701','PartDetail','left=200,top=200,width=530,height=500,toolbar=no,location=no,directories=no,status=no,menubar=no,resizable=yes,scrollbars=yes');w.focus();">57701</a> 
     </td> 
     <td class="blackmedium" width="50"> 
     </td> 
    </tr> 
</tbody> 

回答

1

body > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(2) > td:nth-child(1) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(3) > td:nth-child(1) > table:nth-child(1) > tbody:nth-child(1) > tr:nth-child(8) > td:nth-child(1) > table:nth-child(4) > tbody:nth-child(1) > tr:nth-child(2) > td:nth-child(1) > table:nth-child(1) > tbody:nth-child(1) tr:not(:first-child)

不完全回答你的問題,但如果將M arkup不解析友好,我需要找到一個深深嵌套在可怕的標記table元素中,我更喜歡通過找到它存在的特定標題。在這種情況下,您可以找到具有Part Number標題的表格。實例的XPath:

//table[tr[1]/td/b = "Part Number"] 

接着,在該表中,可以使用"not first child" CSS選擇器:

tr:not(:first-child) 

或者,您也可以使用adjacent selector(找到tr元素之後tr元素,這在邏輯上排除第一行):

tr + tr 

希望這會簡化一些事情。

+0

我無法使用xpath,但是我通過先獲取所有表,然後知道我需要哪個索引來解決它,然後在下一個語句中選擇所有tr元素。你的也應該工作。 (使用jSoup) – appl3r