2016-09-12 97 views
0

我試圖在我的抓取工具中實現策略模式,我認爲使用不同的策略來抓取不同的網站會很好。所以我想page.evaluate內的內容取決於當前正在運行的網站而有所不同。 page.evaluate中的註釋代碼有效,但有沒有辦法可以將它提取到函數中?我試圖運行this.findJobs()沒有成功。在phantomjs中使用策略模式

"use strict"; 

var Crawler = function() { 
    this.page = require('webpage').create(); 
    this.website = ""; 
    this.jobs_list = []; 

}; 

Crawler.prototype.setStrategy = function(company) { 
    this.website = company; 
}; 

Crawler.prototype.findJobData = function() { 
    return this.website.findJobData(); 
}; 

Crawler.prototype.collectJobData = function() { 
    var page = require('webpage').create(); 
    page.onConsoleMessage = function(msg) { console.log(msg) }; 

    page.open('URL', function (status) { 
     page.includeJs("https://ajax.googleapis.com/ajax/libs/jquery/3.1.0/jquery.min.js", function() { 
      var temp_jobs = page.evaluate(this.findJobs()); 

       /* 
       var jobs = []; 
       var job; 
        $('ul.job-list').each(function(){ 
        $(this).find('li').each(function(){ 
         var job_link = $(this).find('a'); 
         var url = "URL" + job_link.attr("href"); 
         var location = $(this).find('span').text(); 

         job = {title: job_link.text(), url: url, location: location, description: ""} 
         jobs.push(job); 
         console.log(job.title, job.url, job.location); 
        }) 
       }); 
       return jobs;*/ 
      console.log(temp_jobs[0].title) 

      phantom.exit(0); 
     }); 
    }); 

}; 

var strategy_a = function() { 

    this.findJobs = function() { 
      var jobs = []; 
      var job; 
      $('ul.job-list').each(function(){ 
       $(this).find('li').each(function(){ 
        var job_link = $(this).find('a'); 
        var url = "URL" + job_link.attr("href"); 
        var location = $(this).find('span').text(); 

        job = {title : job_link.text(), url : url, location : location, description : ""}; 
        jobs.push(job); 
        console.log(job.title, job.url, job.location); 
       }) 
      }); 
      return jobs; 
    }; 
}; 


var strategy_a = new strategy_a(); 
var crawler = new Crawler(); 

crawler.setStrategy(strategy_a); 
crawler.collectJobData(); 

回答

1

你有兩個問題:

  • 你想用page.evaluate(this.findJobs);而不是page.evaluate(this.findJobs());

  • thispage.includeJs回調裏面是不是一個Crawler實例的引用。

這應該工作:您已經生成多頁,而使用所有這些

Crawler.prototype.collectJobData = function() { 
    var page = this.page; 
    var self = this; 
    page.onConsoleMessage = function(msg) { console.log(msg) }; 

    page.open('URL', function (status) { 
     page.includeJs("https://ajax.googleapis.com/ajax/libs/jquery/3.1.0/jquery.min.js", function() { 
      var temp_jobs = page.evaluate(self.website.findJobs); 
      console.log(temp_jobs[0].title) 

      phantom.exit(0); 
     }); 
    }); 
}; 

注意,所以我刪除了第二require('webpage').create()

+0

我收到以下錯誤,當我運行代碼:/類型錯誤:不確定是不是(評估「this.website.findJobs」) 未定義的對象:2 :3 – Pierre

+1

你'Crawler.prototype.findJobData'功能是無用的,因爲具體的'findJobs'函數必須被設置爲直接在頁面上下文中操作。您不能使用'findJobData'爲'findJobs'執行代理,因爲'page.evaluate'是沙箱,並且不允許在頁面上下文之外進行引用。無論如何,我確定了我的答案。 –

+0

感謝您的澄清。 – Pierre