2016-08-12 58 views
0

我使用casperJS 1.1.2和phantomJS 2.1.1從網頁中檢索一些鏈接。我感興趣的所有環節都在href屬性字符串「的JavaScript」,如下圖所示:使用casperJS從網頁上刮取一些鏈接

<td> 
<a href="javascript:WebForm_DoPostBackWithOptions(new 
WebForm_PostBackOptions(&quot;ctl00$CenterContent$ctl01&quot;, 
&quot;&quot;, true, &quot;&quot;, &quot;&quot;, false, 
true))">Species A  
</a></td> 
<td> 
<a href="javascript:WebForm_DoPostBackWithOptions(new 
WebForm_PostBackOptions(&quot;ctl00$CenterContent$ctl02&quot;, 
&quot;&quot;, true, &quot;&quot;, &quot;&quot;, false, 
true))">Species B </a></td> 
<td><a href="javascript:WebForm_DoPostBackWithOptions(new 
WebForm_PostBackOptions(&quot;ctl00$CenterContent$ctl03&quot;, 
&quot;&quot;, true, &quot;&quot;, &quot;&quot;, false, 
true))">Sepcies C </a></td> 
<td> 
<a href="javascript:WebForm_DoPostBackWithOptions(new 
WebForm_PostBackOptions(&quot;ctl00$CenterContent$ctl04&quot;, 
&quot;&quot;, true, &quot;&quot;, &quot;&quot;, false, 
true))">Species D</a></td> 
<td> 
<a href="javascript:WebForm_DoPostBackWithOptions(new 
WebForm_PostBackOptions(&quot;ctl00$CenterContent$ctl05&quot;, 
&quot;&quot;, true, &quot;&quot;, &quot;&quot;, false, 
true))">Species E </a></td> 

我寫casperJS一些腳本湊所有並寫入文件都在那裏將href屬性包含的鏈接一個「javascript」字符串,如下所示。

var links=[]; 
var casper = require('casper').create({ 
    waitTimeout: 10000, 
    verbose: true, 
    logLevel: 'debug', 
    pageSettings: { 
     loadImages: false, 
     loadPlugins: false 
    } 
}); 

var fs = require('fs'); 

casper.start("https://apps.ams.usda.gov/CMS/", function() 
    { 
     links = _utils_.getElementsByXPath('.//td/a[contains(@href,"javascript")]'); 
    }); 

fs.write("plantVarietyResults.json", links, 'w'); 


casper.run(); 

我不明白爲什麼我的腳本沒有正確寫入文件的鏈接。

回答

0

有你的代碼中CasperJS和錯誤的一些誤解:

下面是應該工作的例子(未經測試):

casper.start("https://apps.ams.usda.gov/CMS/", function() { 
    var links = this.evaluate(function(){ 
     return __utils__.getElementsByXPath('.//td/a[contains(@href,"javascript")]') 
      .map(function(element){ 
       return element.href; 
      }); 
    }); 
    fs.write("plantVarietyResults.json", JSON.stringify(links), 'w'); 
}); 

casper.run(); 

這裏有一個稍短的方式:

var x = require('casper').selectXPath; 
casper.start("https://apps.ams.usda.gov/CMS/", function() { 
    var links = this.getElementsAttribute(x('.//td/a[contains(@href,"javascript")]'), 'href'); 
    fs.write("plantVarietyResults.json", JSON.stringify(links), 'w'); 
}); 

casper.run(); 
+0

非常感謝@Artjom B. – ProfLonghair