2012-06-15 116 views
3

提取HREF值jsoup

<table class="table" > 
<tr> 
<td><a href="url">text1</a></td>  
<td>text2</td> 

</tr> 
    <tr> 
    <td><a href="url2">text</a></td> 
    <td>text</td> 

</tr> 

,我想提取 我用

Document doc = Jsoup.connect(url).get(); 
for (Element table : doc.select("table.table")) { 
       for (Element row : table.select("tr")) { 
        Elements tds = row.select("td"); 
          String text1=tds.get(0).text(); 
          String url= row.attr("href"); 
         System.out.println(text1+ "," + url); 
       } 
} 

我得到的text1的值,但網址是所有行的網址和文本空值。

如何從td標籤獲取網址?

回答

7

你的行變量不是a標籤,所以它沒有屬性href

嘗試這樣的:

Element table = doc.select("table.table"); 
Elements links = table.getElementsByTag("a"); 
for (Element link: links) { 
    String url = link.attr("href"); 
    String text = link.text(); 
    System.out.println(text + ", " + url); 
} 

這是相當多從JSoup documentation

0

提取你(也許有其他人)可以用這個嘗試:

Document doc = Jsoup.connect(url).get(); 
     for (Element table : doc.select("table.table")) { 
      for (Element row : table.select("tr")) { 
       for (Element tds : row.select("td")) { 
        Elements links = tds.select("a[href]"); 
        for (Element link : links) { 
         System.out.println("link : " + link.attr("href")); 
         System.out.println("text : " + link.text()); 
        } 
       } 
      } 
     }