2014-04-04 43 views
1

我想在Java中創建一個簡單的Web爬網程序。 我想使用此代碼如何使用硒獲取頁面源代碼RC

WebDriver driver = new HtmlUnitDriver(); 
driver.get("https://codereview.qt-project.org/#change,70"); 
String pageSource=driver.getPageSource(); 
System.out.println(pageSource); 

所以我得到這個源代碼>>

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/REC-html40/strict.dtd"> 
<html><head><META http-equiv="Content-Type" content="text/html; charset=UTF-8"> 
<title>Gerrit Code Review</title><meta content="locale=en_US" name="gwt:property"> 
<script language="javascript" type="text/javascript">var gerrit_hostpagedata={"config": 
{"useContributorAgreements":true,"useContactInfo":false,"allowRegisterNewEmail":false, 

但內容生產用JavaScript,我想獲得的HTML快照。

回答

1

創建JavaScript功能的驅動程序..

WebDriver driver = new HtmlUnitDriver(true); 

結果:

<?xml version="1.0" encoding="UTF-8"?> 
<html> 
    <head> 
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/> 
    <title> 
     codereview.qt-project Code Review 
    </title> 
    <meta content="locale=en_US" name="gwt:property"/> 
    <script language="javascript" type="text/javascript"> 
//<![CDATA[ 
var gerrit_hostpagedata={"config":{"useContributorAgreements":true,"useContactInfo":false,"allowRegisterNewEmail":false,"authType":"HTTP","downloadSchemes":["DEFAULT_DOWNLOADS"],"sshdAddress":"*:29418","wildProject":{"name":"All-Projects"},"approvalTypes":{"approvalTypes":[{"category":{"categoryId":{"id":"CRVW"},"name":"Code Review","abbreviatedName":"R","position":1,"functionName":"MaxWithBlock","copyMinScore":true,"labelName":"Code-Review"},"values":[{"key":{"categoryId":{"id":"CRVW"},"value":-2},"name":"This shall not be merged"},{"key":{"categoryId":{"id":"CRVW"},"value":-1},"name":"I would prefer this is not merged as is"},{"key":{"categoryId":{"id":"CRVW"},"value":0},"name":"No score"},{"key":{"categoryId":{"id":"CRVW"},"value":1},"name":"Looks good to me, but someone else must approve"},{"key":{"categoryId":{"id":"CRVW"},"value":2},"name":"Looks good to me, approved"}],"maxNegative":-2,"maxPositive":2},{"category":{"categoryId":{"id":"SRVW"},"name":"Sanity Review","abbreviatedName":"S","position":2,"functionName":"MaxWithBlock","copyMinScore":false,"labelName":"Sanity-Review"},"values":[{"key":{"categoryId":{"id":"SRVW"},"value":-2},"name":"Major sanity problems found"},{"key":{"categoryId":{"id":"SRVW"},"value":-1},"name":"Sanity problems found"},{"key":{"categoryId":{"id":"SRVW"},"value":0},"name":"No sanity review "},{"key":{"categoryId":{"id":"SRVW"},"value":1},"name":"Sanity review passed"}],"maxNegative":-2,"maxPositive":1}]},"editableAccountFields":["REGISTER_NEW_EMAIL","USER_NAME","FULL_NAME"],"commentLinks":[{"find":"[Tt]ask-number:\\s+([\\w\\-]+)","replace":"\u003ca href\u003d\"http://bugreports.qt-project.org/browse/$1\"\u003e$\u0026\u003c/a\u003e"}],"documentationAvailable":false}};gerrit_hostpagedata.theme={"backgroundColor":"#FCFEEF","topMenuColor":"#44A51C","textColor":"#000000","trimColor":"#B6DCA6","selectionColor":"#FFFFCC"}; 
//]]> 
    </script> 
    <style type="text/css"> 

#gerrit_topmenu { 
    color: #ffffff; 
} 

#gerrit_topmenu .gwt-Label { 
    color: #ffffff; 
} 

#gerrit_topmenu .gwt-TabBarItem-selected .gwt-Label { 
    color: #000000; 
} 

#gerrit_topmenu a, #gerrit_topmenu a:visited, #gerrit_topmenu a:hover { 
    color: #ffffff; 
} 

#qt-footer-links { 
    background-color: #44A51C; 
} 

#qt-footer-links ul { 
    width: 100%; 
    margin: 0; 
    text-align: center; 
    padding: .1em 0 .3em 0; 
} 

#qt-footer-links li { 
    display: inline; 
    padding: .1em 1em; 
} 

#qt-footer-links a, #qt-footer-links a:visited, #qt-footer-links a:hover { 
    font-family: Arial; 
    color: white; 
    font-size: 11px; 
    font-weight: bold; 
    text-decoration: none; 
} 



    </style> 
    <link href="favicon.ico" rel="icon" type="image/gif"/> 
    <link href="gerrit/gwt/chrome/30B802F72484AED7E67C91FE77CD50BD.cache.css" rel="stylesheet"/> 
    <link href="undefined" rel="stylesheet"/> 
    </head> 
    <body> 
    <div id="gerrit_topmenu" class="GCLMTUVDNF"> 
     <table class="GCLMTUVDIK"> 
     <colgroup> 
      <col/> 
      <col/> 
      <col/> 
     </colgroup> 
     <tbody> 
      <tr> 
      <td class="GCLMTUVDMK"> 
       <table cellspacing="0" cellpadding="0" class="GCLMTUVDJK"> 
       <tbody> 
        <tr> 
        <td align="left" style="vertical-align: top;"> 
         <table cellspacing="0" cellpadding="0" class="gwt-TabBar" role="tablist" style="width: 100%;"> 
         <tbody> 
          <tr> 
          <td align="left" style="vertical-align: bottom;" height="100%" class="gwt-TabBarFirst-wrapper"> 
           <div class="gwt-TabBarFirst" style="white-space: normal; height: 100%;"> 
             
           </div> 
          </td> 
          <td align="left" style="vertical-align: bottom;" class="gwt-TabBarItem-wrapper gwt-TabBarItem-wrapper-selected"> 
           <div tabindex="0" class="gwt-TabBarItem gwt-TabBarItem-selected" role="tab"> 
           <div class="gwt-Label" style="white-space: nowrap;"> 
            All 
           </div> 
           </div> 
          </td> 
          <td align="left" style="vertical-align: bottom;" width="100%" class="gwt-TabBarRest-wrapper"> 
           <div class="gwt-TabBarRest" style="white-space: normal; height: 100%;"> 
             
           </div> 
          </td> 
          </tr> 
         </tbody> 
         </table> 
        </td> 
        </tr> 
        <tr> 
        <td align="left" style="vertical-align: top;" height="100%"> 
         <div class="gwt-TabPanelBottom" role="tabpanel"> 
         <div style="width: 100%; height: 100%; padding: 0px; margin: 0px;"> 
          <div class="GCLMTUVDMG" role="menubar" style="width: 100%; height: 100%;"> 
          <a class="GCLMTUVDPG GCLMTUVDNG" href="#q,status:open,n,z" role="menuitem"> 
           Open 
          </a> 
          <a class="GCLMTUVDPG GCLMTUVDNG" href="#q,status:staged,n,z" role="menuitem"> 
           Staged 
          </a> 
          <a class="GCLMTUVDPG GCLMTUVDNG" href="#q,status:integrating,n,z" role="menuitem"> 
           Integrating 
          </a> 
          <a class="GCLMTUVDPG GCLMTUVDNG" href="#q,status:merged,n,z" role="menuitem"> 
           Merged 
          </a> 
          <a class="GCLMTUVDPG GCLMTUVDNG" href="#q,status:deferred,n,z" role="menuitem"> 
           Deferred 
          </a> 
          <a class="GCLMTUVDPG" href="#q,status:abandoned,n,z" role="menuitem"> 
           Abandoned 
          </a> 
          </div> 
         </div> 
         </div> 
        </td> 
        </tr> 
       </tbody> 
       </table> 
      </td> 
      <td class="GCLMTUVDLK"> 
       <div> 
       </div> 
      </td> 
      <td class="GCLMTUVDMK"> 
       <div class="GCLMTUVDKK"> 
       <div class="GCLMTUVDMG" role="menubar"> 
        <a class="GCLMTUVDPG" href="javascript:;" role="menuitem"> 
        Sign In 
        </a> 
       </div> 
       <div class="GCLMTUVDJJ"> 
        <input type="text" class="gwt-TextBox GCLMTUVDHG" value="Change #, SHA-1, tr:id, owner:email or reviewer:email"/> 
        <button type="button" class="gwt-Button"> 
        Search 
        </button> 
       </div> 
       </div> 
      </td> 
      </tr> 
     </tbody> 
     </table> 
     <div class="GCLMTUVDGJ"> 
     <span class="GCLMTUVDEJ GCLMTUVDFJ" style=""> 
      Loading ... 
     </span> 
     </div> 
    </div> 
    <div id="gerrit_header"> 
     <div> 
     <img src="static/logo_open_gov.png" style="margin: 18px 0 0 10px;"/> 
     <img src="static/logo_qt.png" style="float: right; margin: 18px 28px 0 0;"/> 
     </div> 
    </div> 
    <div id="gerrit_body" class="GCLMTUVDMF"> 
     <div> 
     <div style="display: none;"> 
      <div class="GCLMTUVDHJ GCLMTUVDLB"> 
      <div class="GCLMTUVDIJ"> 
       <span class="gwt-InlineLabel"> 
       </span> 
      </div> 
      <div> 
       <table cellspacing="0" cellpadding="0"> 
       <tbody> 
        <tr> 
        <td align="left" style="vertical-align: top;"> 
         <table class="GCLMTUVDFG GCLMTUVDKB"> 
         <colgroup> 
          <col/> 
          <col/> 
         </colgroup> 
         <tbody> 
          <tr> 
          <td class="header GCLMTUVDNK"> 
           Change-Id: 
          </td> 
          <td class="GCLMTUVDNK GCLMTUVDBC"> 
             
          </td> 
          </tr> 
          <tr> 
          <td class="header"> 
           Owner 
          </td> 
          <td> 
             
          </td> 
          </tr> 
          <tr> 
          <td class="header"> 
           Project 
          </td> 
          <td> 
             
          </td> 
          </tr> 
          <tr> 
          <td class="header"> 
           Branch 
          </td> 
          <td> 
             
          </td> 
          </tr> 
          <tr> 
          <td class="header"> 
           Topic 
          </td> 
          <td> 
             
          </td> 
          </tr> 
          <tr> 
          <td class="header"> 
           Uploaded 
          </td> 
          <td> 
             
          </td> 
          </tr> 
          <tr> 
          <td class="header"> 
           Updated 
          </td> 
          <td> 
             
          </td> 
          </tr> 
          <tr> 
          <td class="header GCLMTUVDDB"> 
           Status 
          </td> 
          <td> 
             
          </td> 
          </tr> 
          <tr> 
          <td class="GCLMTUVDHI"> 
             
          </td> 
          <td class="GCLMTUVDHI"> 
             
          </td> 
          </tr> 
         </tbody> 
         </table> 
        </td> 
        <td align="left" style="vertical-align: top;"> 
         <div class="GCLMTUVDMB"> 
         </div> 
        </td> 
        </tr> 
       </tbody> 
       </table> 
       <div class="GCLMTUVDO"> 
       <table class="GCLMTUVDGG"> 
        <colgroup> 
        <col/> 
        <col/> 
        <col/> 
        <col/> 
        <col/> 
        </colgroup> 
        <tbody> 
        <tr> 
         <td class="header"> 
         Reviewer 
         </td> 
         <td class="header"> 
           
         </td> 
         <td class="header"> 
         Code Review 
         </td> 
         <td class="header"> 
         Sanity Review 
         </td> 
         <td class="header GCLMTUVDDJ"> 
           
         </td> 
        </tr> 
        </tbody> 
       </table> 
       <ul class="GCLMTUVDCH"> 
       </ul> 
       <div class="GCLMTUVDK" style="display: none;"> 
        <div> 
        <input type="text" class="gwt-SuggestBox GCLMTUVDHG" value="Name or Email"/> 
        <button type="button" class="gwt-Button"> 
         Add Reviewer 
        </button> 
        </div> 
       </div> 
       </div> 
       <table cellspacing="0" cellpadding="0" class="gwt-DisclosurePanel gwt-DisclosurePanel-closed"> 
       <tbody> 
        <tr> 
        <td align="left" style="vertical-align: top;"> 
         <a href="javascript:void(0);" style="display: block;" class="header"> 
         <table> 
          <tbody> 
          <tr> 
           <td align="center" style="width: 16px;"> 
           <img onload="this.__gwtLastUnhandledEvent=&quot;load&quot;;" src="https://codereview.qt-project.org/gerrit/clear.cache.gif" style="width: 16px; height: 16px; background: url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAAAfklEQVR42mNgoDZITk4WosiAtLS0M6mpqb1Amp9cAy4B8X8gfpWenp5MiQEwfB6IbSgxAIaXArEcJQaA8Ddg+NQVFhZykmsADG8MDQ1lJseA5wQDFocBP0FRm5WVxUNOGGwEJi4VcmLhKtC5HuSkg8NA5+bjDCRCAG8UDUoAAIw8kVdwMG+3AAAAAElFTkSuQmCC) no-repeat 0px 0px" border="0" class="gwt-Image"/> 
           </td> 
           <td> 
           Included in 
           </td> 
          </tr> 
          </tbody> 
         </table> 
         </a> 
        </td> 
        </tr> 
        <tr> 
        <td align="left" style="vertical-align: top;"> 
         <div style="padding: 0px; overflow: hidden; display: none;"> 
         <table class="content"> 
          <colgroup> 
          <col/> 
          </colgroup> 
          <tbody> 
          <tr> 
           <td> 
             
           </td> 
          </tr> 
          </tbody> 
         </table> 
         </div> 
        </td> 
        </tr> 
       </tbody> 
       </table> 
       <table cellspacing="0" cellpadding="0" class="gwt-DisclosurePanel gwt-DisclosurePanel-closed"> 
       <tbody> 
        <tr> 
        <td align="left" style="vertical-align: top;"> 
         <a href="javascript:void(0);" style="display: block;" class="header"> 
         <table> 
          <tbody> 
          <tr> 
           <td align="center" style="width: 16px;"> 
           <img onload="this.__gwtLastUnhandledEvent=&quot;load&quot;;" src="https://codereview.qt-project.org/gerrit/clear.cache.gif" style="width: 16px; height: 16px; background: url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAAAfklEQVR42mNgoDZITk4WosiAtLS0M6mpqb1Amp9cAy4B8X8gfpWenp5MiQEwfB6IbSgxAIaXArEcJQaA8Ddg+NQVFhZykmsADG8MDQ1lJseA5wQDFocBP0FRm5WVxUNOGGwEJi4VcmLhKtC5HuSkg8NA5+bjDCRCAG8UDUoAAIw8kVdwMG+3AAAAAElFTkSuQmCC) no-repeat 0px 0px" border="0" class="gwt-Image"/> 
           </td> 
           <td> 
           Dependencies 
           </td> 
          </tr> 
          </tbody> 
         </table> 
         </a> 
        </td> 
        </tr> 
        <tr> 
        <td align="left" style="vertical-align: top;"> 
         <div style="padding: 0px; overflow: hidden; display: none;"> 
         <table class="GCLMTUVDOB content" style="width: auto;"> 
          <colgroup> 
          <col/> 
          </colgroup> 
          <tbody> 
          <tr> 
           <td class="GCLMTUVDDG"/> 
           <td class="GCLMTUVDDG"/> 
           <td class="GCLMTUVDFB GCLMTUVDKD"> 
           ID 
           </td> 
           <td class="GCLMTUVDKD"> 
           Subject 
           </td> 
           <td class="GCLMTUVDKD"> 
           Owner 
           </td> 
           <td class="GCLMTUVDKD"> 
           Project 
           </td> 
           <td class="GCLMTUVDKD"> 
           Branch 
           </td> 
           <td class="GCLMTUVDKD"> 
           Updated 
           </td> 
          </tr> 
          <tr> 
           <td colspan="8" class="GCLMTUVDKJ"> 
           Depends On 
           </td> 
          </tr> 
          <tr> 
           <td colspan="8" class="GCLMTUVDOE"> 
           (None) 
           </td> 
          </tr> 
          <tr> 
           <td colspan="8" class="GCLMTUVDKJ"> 
           Needed By 
           </td> 
          </tr> 
          <tr> 
           <td colspan="8" class="GCLMTUVDOE"> 
           (None) 
           </td> 
          </tr> 
          </tbody> 
         </table> 
         </div> 
        </td> 
        </tr> 
       </tbody> 
       </table> 
       <table class="GCLMTUVDLJ"> 
       <colgroup> 
        <col/> 
        <col/> 
       </colgroup> 
       <tbody> 
        <tr> 
        <td> 
         Old Version History: 
        </td> 
        <td> 
         <select class="gwt-ListBox"> 
         <option value="Base" selected="selected"> 
          Base 
         </option> 
         </select> 
        </td> 
        </tr> 
       </tbody> 
       </table> 
       <div> 
       </div> 
       <div class="GCLMTUVDJB"> 
       </div> 
      </div> 
      </div> 
     </div> 
     </div> 
    </div> 
    <div style="clear: both; margin-top: 15px; padding-top: 2px; margin-bottom: 15px;"> 
     <div id="gerrit_footer"> 
     <div> 
      <div id="qt-footer-links"> 
      <ul> 
       <li> 
       <a href="http://qt.digia.com/"> 
        qt.digia.com 
       </a> 
       </li> 
       <li> 
       <a href="http://qt-project.org/doc/"> 
        Qt Documentation 
       </a> 
       </li> 
       <li> 
       <a href="http://qt-project.org/"> 
        Qt-Project 
       </a> 
       </li> 
       <li> 
       <a href="http://planet.qt-project.org/"> 
        Planet Qt 
       </a> 
       </li> 
       <li> 
       <a href="http://qt.gitorious.org/"> 
        Qt Repositories - Gitorious 
       </a> 
       </li> 
       <li> 
       <a href="http://bugreports.qt-project.org/"> 
        Qt Bug Tracker - JIRA 
       </a> 
       </li> 
      </ul> 
      </div> 
     </div> 
     </div> 
     <div id="gerrit_btmmenu" style="clear: both;"> 
     <div class="GCLMTUVDIG"> 
      Press '?' to view keyboard shortcuts 
     </div> 
     <div class="GCLMTUVDAL"> 
      Powered by 
      <a href="http://code.google.com/p/gerrit/" target="_blank"> 
      Gerrit Code Review 
      </a> 
      (V2.2.1-NQT-012) | 
      <a href="http://code.google.com/p/gerrit/issues/list" target="_blank"> 
      Report Bug 
      </a> 
     </div> 
     </div> 
    </div> 
    <iframe id="__gwt_historyFrame" src="javascript:''" style="position:absolute;width:0;height:0;border:0" tabindex="-1"> 
    </iframe> 
    <script language="javascript" type="text/javascript"> 
//<![CDATA[ 
<!-- 
function gerrit(){var s,l,t,w=window,d=document,n='gerrit',f=d.createElement('iframe');function m(){if(s&&l){var b,i=d.createElement('img');i.src=n+'/clear.cache.gif';b=i.src;b=b.substring(0,b.lastIndexOf('/')+1);gerrit=null;f.contentWindow.gwtOnLoad(undefined,n,b);}}gerrit.onScriptLoad=function(){s=1;m();};gerrit.r=function(){l=1;m();};f.src="javascript:''";f.id=n;f.style.cssText='position:absolute;width:0;height:0;border:none';f.tabIndex=-1;d.body.appendChild(f);f.contentWindow.location.replace(n+'/7209E38C5F54FA2918411884E5DCDFEC.cache.html');d.write('<script defer="defer">gerrit.r()</'+'script>');}gerrit(); 
//--> 
//]]> 
    </script> 
    <iframe src="javascript:''" id="gerrit" style="position:absolute;width:0;height:0;border:none" tabindex="-1"> 
    </iframe> 
    <script defer="defer"> 
//<![CDATA[ 
gerrit.r() 
//]]> 
    </script> 
    </body> 
</html> 
相關問題