2016-04-17 70 views
0

我試圖讓在<a class="subHover"所有環節,但事情是,用代碼我寫的,我得到的頁面中的所有鏈接,這裏是我的代碼:的Java jsoup鏈接提取(錯誤輸出)

String website = "http://www.svensktnaringsliv.se/english/publications/?start=" +maxPage; 
      Document docOne = Jsoup.connect(website).get(); 
      Elements elem = docOne.getElementsByAttributeValue("class", "search-result"); 
      Elements el = elem.attr("class", "subHover"); 
      System.out.println(el.select("a[href]")); 

我真的不知道我做錯了:/ 代碼的輸出是:

<a href="http://www.svensktnaringsliv.se/english/publications/corporate-governance-internal-control-and-compliance-from-an-info_578545.html"> <img class="border" src="http://www.svensktnaringsliv.se/migration_catalog/Rapporter_och_opinionsmaterial/Rapporters/corporate_governance_10017apdf_579280.html/ALTERNATES/PORTRAIT_170/Corporate_Governance_10017a.pdf"> </a> 
<a class="subHover" href="http://www.svensktnaringsliv.se/english/publications/corporate-governance-internal-control-and-compliance-from-an-info_578545.html"> <h2> Corporate Governance, Internal Control and Compliance - - From an Information Security Perspective</h2> </a> 
<a class="noHover" href="http://www.svensktnaringsliv.se/personer/christer-magnusson_538711.html"><span class="entypo entypo-user"></span><span>Christer Magnusson</span></a> 
<a href="http://www.svensktnaringsliv.se/english/publications/from-stagnation-to-acceleration-proposed-guidelines-for-a-europea_595930.html"> <img class="border" src="http://www.svensktnaringsliv.se/migration_catalog/Rapporter_och_opinionsmaterial/Rapporter/proposed_guidelines_for_a_european_research_policypng_595932.html/ALTERNATES/PORTRAIT_170/Proposed_guidelines_for_a_European_research_policy.png"> </a> 
<a class="subHover" href="http://www.svensktnaringsliv.se/english/publications/from-stagnation-to-acceleration-proposed-guidelines-for-a-europea_595930.html"> <h2>From stagnation to acceleration - Proposed guidelines for a European research policy</h2> </a> 
<a class="noHover" href="http://www.svensktnaringsliv.se/medarbetare/emil-gornerup_566685.html"><span class="entypo entypo-user"></span><span>Emil Görnerup</span></a> 
<a href="http://www.svensktnaringsliv.se/english/publications/decision-usefulness-explored-an-investigation-of-capital-market-a_588531.html"> <img class="border" src="http://www.svensktnaringsliv.se/migration_catalog/decision-usefulness_omslagjpg_588538.html/ALTERNATES/PORTRAIT_170/Decision%20usefulness_omslag.jpg"> </a> 
<a class="subHover" href="http://www.svensktnaringsliv.se/english/publications/decision-usefulness-explored-an-investigation-of-capital-market-a_588531.html"> <h2>Decision usefulness explored - An investigation of capital market actors´ use of financial reports</h2> </a> 
<a class="subHover" href="http://www.svensktnaringsliv.se/english/publications/tax-reductions-and-public-resources_590643.html"> <h2>Tax reductions and public resources</h2> </a> 
<a class="noHover" href="http://www.svensktnaringsliv.se/english/staff/mikael-witterblad_572108.html"><span class="entypo entypo-user"></span><span>Mikael Witterblad</span></a> 
<a class="noHover" href="http://www.svensktnaringsliv.se/medarbetare/johan-fall_551949.html"><span class="entypo entypo-user"></span><span>Johan Fall</span></a> 

回答

1

原因你的結果是,該文件包含HTML這樣的:

<div class="subHover"> 
<span class="subject">PUBLICATION</span> 
<span class="subject-info"><b>Publicerad:</b> <time datetime="2005-06-30">30 June 2005 </time></span> 
<div class="result-content clearfix"> 
    <a class="subHover" href="http://www.svensktnaringsliv.se/material/rapporter/internationell-utblick-loner-och-arbetskraftskostnader-juni-2005-_565749.html"> <h2>Internationell utblick - Löner och arbetskraftskostnader juni 2005/International Outlook - Wages, Salaries, Labour Costs June 2005</h2> </a> 
    <div class="info-block"> 
    <p><a class="noHover" href="http://www.svensktnaringsliv.se/medarbetare/krister-b-andersson_560480.html"><span class="entypo entypo-user"></span><span>Krister B Andersson</span></a></p> 
    </div> 
</div> 
</div> 

你可以看到,外部div是subHover,你可以在代碼中找到它。稍後,您選擇具有href屬性的a之內的任何內容,但是不要強制a的類也爲subHover

爲什麼你不使用CSS選擇器?這應該工作:

String website = "http://www.svensktnaringsliv.se/english/publications/?start=" +maxPage; 
Document docOne = Jsoup.connect(website).get(); 
Elements els = docOne.select("a.subHover"); 
for (Element el : els){ 
    System.out.println(el); 
} 

我建議學習CSS選擇器的功率,as described in the JSoup documentation