通過

2016-09-17 56 views
1

JSoup無法正常提取元素我有一個網頁以下元素:通過

<div id="pnNij" class="post" data-tag1="" data-tag2=""> 
    <a class="image-list-link" href="http://imgur.com/gallery/pnNij" data-page="0"> 
     <img alt="" src="./Imgur_ The most awesome images on the Internet_files/H7fZCNgb.jpg"> 


      <div class="point-info gradient-transparent-black transition"> 
       <div class="relative"> 
        <div class="pa-bottom"> 
         <div class="arrows"> 
          <div title="like" class="pointer arrow-up icon-upvote-outline" data="pnNij" type="image" data-up="4212"></div> 
          <div title="dislike" class="pointer arrow-down icon-downvote-outline" data="pnNij" type="image" data-downs="502"></div> 
          <div class="clear"></div> 
         </div> 

         <div class="point-info-points" title="points"> 
          <span class="points-pnNij">3,710</span> 
          <span class="points-text-pnNij">points</span> 
         </div> 
        </div> 
       </div> 
      </div> 

    </a> 
    <div class="hover"> 
        <p>Seems like 2017 has it all...</p> 


     <div class="post-info"> 
      album · 69,542 views 
     </div> 
    </div> 

</div> 

通知HREF如何等於http://imgur.com/gallery/pnNij

然而,當我使用JSoup取出從頁面元素是這樣的:

docImgur = Jsoup.connect("http://imgur.com/").get(); 
Elements links = docImgur.getElementsByClass("post"); 

該元件幾乎正確提取,除了href屬性是等於/畫廊/ pnNij/

爲什麼href屬性是否不包含完整的URL?

+0

元素鏈接對應在代碼div的ID = 「pnNij」。你錯過了如何到達錨點並獲得href屬性。請添加這些代碼段。 –

+0

我的答案是否解決了這個問題?如果是這樣,請考慮接受它作爲答案。 –

回答

0

當您檢查網頁的源文件,你會發現

<a class="image-list-link" href="/gallery/WRzti" data-page="0"> 
    ... 
</a> 

所以href屬性也不是絕對的,這會導致你的預計業績:/gallery/WRzti

解決方案

使用abs: attribute prefix

Document docImgur = Jsoup.connect("http://imgur.com/").get(); 

Elements links = docImgur.select("a[href].image-list-link"); 

for (Element element : links) { 
    System.out.println(element.attr("abs:href")); 
} 

輸出

http://imgur.com/gallery/WRzti 
http://imgur.com/gallery/tCnDJ 
http://imgur.com/gallery/JIHYh 
...