2011-06-11 115 views
3

對於我的生活,我無法弄清楚如何使用jsoup鏈接結尾在「51u1FaI-FHL._SL500_AA300_.jpg」中選擇img src。Jsoup選擇表格數據

我試過多種東西,但都沒有工作。任何幫助?

doc1 = Jsoup.connect("http://rads.stackoverflow.com/amzn/click/B0051HDDO2").timeout(20000).get(); 
Element table = doc1.select("table[class=productImageGrid]").first() 
Iterator<Element> ite = table.select("td[height=300]").iterator(); 

感謝, 科迪

<table style="text-align: center;" border="0" cellpadding="0" cellspacing="0" width="300"> 
    <tr> 
    <td id="prodImageCell" height="300" width="300" style="padding-bottom: 10px;"><img onclick="if(0){ async_openImmersiveView(event);} else {openImmersiveView(event);}" class="prod_image_selector" style="cursor:pointer;" onload="if (typeof uet == 'function') { uet('af'); }" **src="http://ecx.images-amazon.com/images/I/51u1FaI-FHL._SL500_AA300_.jpg"** id="prodImage"/><div id="prodImageCellInner" style="position: relative; height:0px; "><!--Comment for IE as it is empty div--></div></td> 
    <td id="prodVideoClick" style="display:none"></td> 
    <img id="loadingImage" src=http://g-ecx.images-amazon.com/images/G/01/ui/loadIndicators/loading-large_boxed._V192195297_.gif style="position: absolute; z-index: 200; display:none"> 
</tr> 
    <tr> 
    <td class="tiny" style="padding-bottom: 5px;">&nbsp;<span id="prodImageCaption" style="color: #666666; font-size: 10px;">Click for larger image and other views</span>&nbsp;</td> 
    </tr> 
</table> 

回答

0

@ user793728:試試這個: -

document = Jsoup.connect("http://rads.stackoverflow.com/amzn/click/B0051HDDO2").timeout(20000).get(); 

Elements elements =document.select(".prod_image_selector"); 
    for (Element element : elements){ 
     Attributes imageAttributes=element.attributes(); 
     for (Attribute attribute: imageAttributes){ 
      if(attribute.getKey().equals("src")){ 
      String imageURL=attribute.getValue(); 
      } 
     } 

    } 
0

這裏的問題似乎是,亞馬遜將返回不同的HTML到jsoup比它您的瀏覽器,基於請求UserAgent。

我將UserAgent設置爲一個已知的瀏覽器,並使用#prodImage ID選擇元素,並得到結果OK。

例如

Document doc = Jsoup.connect("http://rads.stackoverflow.com/amzn/click/B0051HDDO2") 
     .timeout(20000) 
     .userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_7) AppleWebKit/534.30 (KHTML, like Gecko) Chrome/12.0.742.91 Safari/534.30") 
     .get(); 
Element img = doc.select("#prodImage").first(); 
System.out.println(img.attr("src")); 

返回http://ecx.images-amazon.com/images/I/51u1FaI-FHL._SL500_AA300_.jpg

要解決這樣的問題,我suggesst輸出doc.html(),看着檢索,解析HTML,因爲它可以從瀏覽器的查看源代碼的HTML不同(如服務器能返回不同的HTML,以及在HTML被整理並構建到DOM之前的視圖源顯示)。

希望這會有所幫助!