2017-09-04 101 views
1

我怎樣才能像下面的等價方式設置爲用戶代辦Scrapy與飛濺:更改Scrapy /飛濺用戶代理

import requests 
from bs4 import BeautifulSoup 

ua = {"User-Agent":"Mozilla/5.0"} 
url = "http://www.example.com" 
page = requests.get(url, headers=ua) 
soup = BeautifulSoup(page.text, "lxml") 

我的蜘蛛將類似於此:

import scrapy 
from scrapy_splash import SplashRequest 


class ExampleSpider(scrapy.Spider): 
     name = "example" 
     allowed_domains = ["example.com"] 
     start_urls = ["https://www.example.com/"] 

     def start_requests(self): 
      for url in self.start_urls: 
       yield SplashRequest(
        url, 
        self.parse, 
        args={'wait': 0.5} 
       ) 
+0

你嘗試SplashRequest的'splash_headers'參數? –

回答

2

你需要設置user_agent屬性覆蓋默認的用戶代理:

class ExampleSpider(scrapy.Spider): 
    name = 'example' 
    user_agent = 'Mozilla/5.0' 

在這種情況下UserAgentMiddleware(即enabled by default)將覆蓋USER_AGENT設置值爲'Mozilla/5.0'

您還可以覆蓋頭部每個請求:

scrapy_splash.SplashRequest(url, headers={'User-Agent': custom_user_agent}) 
+0

,但據我瞭解,Splash不考慮Scrapy設置。 – zinyosrim

+1

@zinyosrim,我剛剛使用httpbin.org/headers URL進行了檢查,'user_agent'肯定會影響httpbin響應中的'User-Agent'值。 – skovorodkin