0
我使用casperJS 1.1.2和phantomJS 2.1.1從網頁中檢索一些鏈接。我感興趣的所有環節都在href屬性字符串「的JavaScript」,如下圖所示:使用casperJS從網頁上刮取一些鏈接
<td>
<a href="javascript:WebForm_DoPostBackWithOptions(new
WebForm_PostBackOptions("ctl00$CenterContent$ctl01",
"", true, "", "", false,
true))">Species A
</a></td>
<td>
<a href="javascript:WebForm_DoPostBackWithOptions(new
WebForm_PostBackOptions("ctl00$CenterContent$ctl02",
"", true, "", "", false,
true))">Species B </a></td>
<td><a href="javascript:WebForm_DoPostBackWithOptions(new
WebForm_PostBackOptions("ctl00$CenterContent$ctl03",
"", true, "", "", false,
true))">Sepcies C </a></td>
<td>
<a href="javascript:WebForm_DoPostBackWithOptions(new
WebForm_PostBackOptions("ctl00$CenterContent$ctl04",
"", true, "", "", false,
true))">Species D</a></td>
<td>
<a href="javascript:WebForm_DoPostBackWithOptions(new
WebForm_PostBackOptions("ctl00$CenterContent$ctl05",
"", true, "", "", false,
true))">Species E </a></td>
我寫casperJS一些腳本湊所有並寫入文件都在那裏將href屬性包含的鏈接一個「javascript」字符串,如下所示。
var links=[];
var casper = require('casper').create({
waitTimeout: 10000,
verbose: true,
logLevel: 'debug',
pageSettings: {
loadImages: false,
loadPlugins: false
}
});
var fs = require('fs');
casper.start("https://apps.ams.usda.gov/CMS/", function()
{
links = _utils_.getElementsByXPath('.//td/a[contains(@href,"javascript")]');
});
fs.write("plantVarietyResults.json", links, 'w');
casper.run();
我不明白爲什麼我的腳本沒有正確寫入文件的鏈接。
非常感謝@Artjom B. – ProfLonghair