Scrapy Craigslist的腳本

我想創建一個Scrapy腳本來湊所有結果電腦演出在Craigslist的任意子域：例如這裏：http://losangeles.craigslist.org/search/cpg/ 此查詢返回的許多文章的列表，我試圖刮這些結果（不僅是第一頁上的結果）的標題和href無法使用CrawlSpider和linkExtractor，但腳本不會返回任何結果。我會在這裏貼上我的劇本，感謝Scrapy Craigslist的腳本

import scrapy 
    from scrapy.spiders import Rule,CrawlSpider 
    from scrapy.linkextractors import LinkExtractor 

    class CraigspiderSpider(CrawlSpider): 
     name = "CraigSpider" 
     allowed_domains = ["http://losangeles.craigslist.org"] 
     start_urls = (
        'http://losangeles.craigslist.org/search/cpg/', 
     ) 

     rules = (Rule(LinkExtractor(allow=(), restrict_xpaths=('//a[@class="button next"]',)), callback="parse_page", follow= True),) 

     def parse_page(self, response): 
      items = response.selector.xpath("//p[@class='row']") 
     for i in items: 
      link = i.xpath("./span[@class='txt']/span[@class='pl']/a/@href").extract() 
      title = i.xpath("./span[@class='txt']/span[@class='pl']/a/span[@id='titletextonly']/text()").extract() 
      print link,title

來源

2016-03-12 Ernesto PM

根據您粘貼代碼，parse_page：

不返回/產生任何東西，
只包含一條線：「項目= response.selector ...「

以上＃2的原因是for循環未正確縮進。

嘗試縮進for循環：

class CraigspiderSpider(CrawlSpider): 
    name = "CraigSpider" 
    allowed_domains = ["http://losangeles.craigslist.org"] 
    start_urls = ('http://losangeles.craigslist.org/search/cpg/',) 

    rules = (Rule(
     LinkExtractor(allow=(), restrict_xpaths=('//a[@class="button next"]',)), 
     callback="parse_page", follow= True)) 

    def parse_page(self, response): 
     items = response.selector.xpath("//p[@class='row']") 

     for i in items: 
      link = i.xpath("./span[@class='txt']/span[@class='pl']/a/@href").extract() 
      title = i.xpath("./span[@class='txt']/span[@class='pl']/a/span[@id='titletextonly']/text()").extract() 
      print link, title 
      yield dict(link=link, title=title)

來源

2016-03-12 16:47:30

非常感謝你，你的回答幫我擺脫對這個問題的一些情況。我現在在我的github帳戶上有一個工作版本，稍後我會編輯我的問題，以包含我的回購鏈接。謝謝！有一個美好的一天 –

@ErnestoPM：隨時接受這個答案，並upvote，如果你非常喜歡它。 –

Scrapy Craigslist的腳本

回答

相關問題