通過jsoup

獲取除了div標籤從表（HTML）的數據我的html代碼：通過jsoup

<table width="100%" cellpadding="5" cellspacing="2" class="zebra"> 
    <tr> 
    <td colspan="5"> 
    <div class="paginator"> 
    <a href="http://some_link">2</a>&nbsp;   
    </div> 
    </td> 
    </tr> 
    <tr> 
    <td><a href="//i_need_only_this_link">some_value</a></td>  
    </tr> 
    <tr>  
    <td><a href="//i_need_only_this_link1">some_value</a></td>  
    </tr> 
    <tr> 
    <td colspan="2"> 
    <div class="paginator">   
    <a href="http://some_link">2</a>&nbsp; 
    </div> 
    </td> 
    </tr> 
</table>

我用Jsoup。我如何獲得除div標籤中的鏈接以外的所有鏈接？我嘗試做這樣的事情，但它不起作用。元素包含所有鏈接。

org.jsoup.nodes.Elements tableText = doc.select("table.zebra").not("tr td div.paginator"); 

for (org.jsoup.nodes.Element td : tableText.select("td a")) { 
    System.out.println(td.attr("href")); // http://some_link 
    .... 
    }

來源

2016-11-10 Helen

您可以使用下面的代碼..

Document html = Jsoup.parse(htmlStr); 

    for (Element e : html.getElementsByTag("a")) { 

     if (!"div".equalsIgnoreCase(e.parentNode().nodeName())) { 
      System.out.println(e.attr("href")); 
     } 

    }

在這裏，我檢查錨元素的父節點不是股利。如果它不是div我打印的網址。

來源

2016-11-10 11:13:56 Jobin

儘管使用'「abs：href」'可能是個好主意：https：//jsoup.org/cookbook/extracting-data/working-with-urls –

回答

相關問題