Python Scrapy：我如何使用self.download_delay

我從來沒有使用Scrapy。請幫忙！Python Scrapy：我如何使用self.download_delay

我想使在「next_link」爲每個請求的延遲

實施例：

延遲30秒

class CVSpider(scrapy.Spider): 
    name = 'cvspider' 
    start_urls = ["login"] 
    custom_settings = { 
     'DOWNLOAD_DELAY': 0, 
     'RANDOMIZE_DOWNLOAD_DELAY': True 
    } 

    def __init__(self, search_url, name=None, **kwargs): 
     self.search_url = search_url 

    def parse(self, response): 
     xsrf = response.css('input[name="_xsrf"] ::attr(value)')\ 
         .extract_first() 
     return scrapy.FormRequest.from_response(
      response, 
      formdata={ 
       'username': USERNAME, 
       'password': PASSWORD, 
       '_xsrf': xsrf 
      }, 
      callback=self.after_login 
     ) 

    def after_login(self, response): 
     self.logger.info('Parse %s', response.url) 
     if "account/login" in response.url: 
      self.logger.error("Login failed!") 
      return 

     return scrapy.Request(self.search_url, callback=self.parse_search_page) 

    def parse_search_page(self, response): 
     cv_hashes = response\ 
      .css('table.output tr[itemscope="itemscope"]::attr(data-hash)')\ 
      .extract() 
     total = len(cv_hashes) 
     start_time = datetime.now() 
     next_link = response.css('a.Controls-Next::attr(href)')\ 
          .extract_first() 
     if total == 0: 
      next_link = None 
     if next_link is not None: 
      self.download_delay = 30 - does not work 
      yield scrapy.Request(
       "https://example.com" + next_link, 
       callback=self.parse_search_page 
      )

來源

2017-04-06 Alex Karahanidi

有一個設置選項來實現此目的。在settings.py文件，設置DOWNLOAD_DELAY，像這樣：

DOWNLOAD_DELAY = 30000 # Time in milliseconds (30000 ms = 30 seconds)

但是請記住，從你的代碼中刪除custom_settings。

如果你想與該Spider自定義設置要做到這一點，然後修改這樣的代碼：

class CVSpider(scrapy.Spider): 
    name = 'cvspider' 
    start_urls = ["login"] 
    custom_settings = { 
     'DOWNLOAD_DELAY': 30000, 
     'RANDOMIZE_DOWNLOAD_DELAY': False 
    } 

    def __init__(self, search_url, name=None, **kwargs): 
    ...

您可以參考documentation上更多的瞭解。

來源

2017-04-06 16:37:29

DOWNLOAD_DELAY設置爲所有請求，但我只需要爲： self.download_delay = 30 - 不工作產量scrapy.Request（ –

你可以用'睡眠（）'這裏 –

這個想法是隻設置你的蜘蛛中的download_delay變量，scrapy會完成剩下的工作，你不需要實際「使用」它。

所以設置好它，如：

class MySpider(Spider): 
    ... 
    download_delay = 30000 
    ...

就是這樣。

來源

2017-04-06 16:40:45 eLRuLL

我想成立一個？只爲parse_search_page func延遲 –

Python Scrapy：我如何使用self.download_delay

回答

相關問題