2017-05-30 24 views
1

我已經通過腳本來實現我的蜘蛛一樣,主要的例子:如何改變scrapy用戶代理,而不設置文件

import scrapy 

class BlogSpider(scrapy.Spider): 
    name = 'blogspider' 
    start_urls = ['https://blog.scrapinghub.com'] 

    def parse(self, response): 
     for title in response.css('h2.entry-title'): 
      yield {'title': title.css('a ::text').extract_first()} 

     next_page = response.css('div.prev-post > a ::attr(href)').extract_first() 
     if next_page: 
      yield scrapy.Request(response.urljoin(next_page), callback=self.parse) 

我奔跑着:

scrapy runspider myspider.py

如何更改用戶代理,如果我沒有設置或從startproject命令創建?由於這裏指定:

https://doc.scrapy.org/en/latest/topics/settings.html

回答

1

添加USER_AGENTsettings.py文件:

USER_AGENT = "custom_user_agent" 

你也能使用改變USER_AGENT通過cmdline

scrapy runspider myspider.py -s USER_AGENT="custom_user_agent" 
+0

由於我使用Django,這是Django的settings.py? – Atma

+0

不,這是scrapy的settings.py。 – JkShaw

0

您可以手動添加頁眉在您的請求中,您可以指定一個自定義User Agent

在你的蜘蛛的文件,當你請求:

yield scrapy.Request(self.start_urls, callback=self.parse, headers={"User-Agent": "Your Custom User Agent"}) 

所以你的蜘蛛看起來就像是:

class BlogSpider(scrapy.Spider): 
    name = 'blogspider' 
    start_urls = ['https://blog.scrapinghub.com'] 

    def start_requests(self): 
     yield scrapy.Request(self.start_urls, callback=self.parse, headers={"User-Agent": "Your Custom User Agent"}) 

    def parse(self, response): 
     for title in response.css('h2.entry-title'): 
      yield {'title': title.css('a ::text').extract_first()} 

     next_page = response.css('div.prev-post > a ::attr(href)').extract_first() 
     if next_page: 
      yield scrapy.Request(response.urljoin(next_page), callback=self.parse, headers={"User-Agent": "Your Custom User Agent"})