從Splash請求中讀取cookies

我試圖在使用Splash發出請求後訪問cookie。以下是我如何構建請求。從Splash請求中讀取cookies

script = """ 
function main(splash) 
    splash:init_cookies(splash.args.cookies) 
    assert(splash:go{ 
    splash.args.url, 
    headers=splash.args.headers, 
    http_method=splash.args.http_method, 
    body=splash.args.body, 
    }) 
    assert(splash:wait(0.5)) 

    local entries = splash:history() 
    local last_response = entries[#entries].response 
    return { 
    url = splash:url(), 
    headers = last_response.headers, 
    http_status = last_response.status, 
    cookies = splash:get_cookies(), 
    html = splash:html(), 
    } 
end 
""" 
req = SplashRequest(
    url, 
    self.parse_page, 
    args={ 
     'wait': 0.5, 
     'lua_source': script, 
     'endpoint': 'execute' 
    } 
)

該腳本是Splash文檔的精確副本。

所以我試圖訪問在網頁上設置的cookie。當我不使用Splash時，下面的代碼按照我的預期工作，但在使用Splash時不起作用。

self.logger.debug('Cookies: %s', response.headers.get('Set-Cookie'))

這同時使用飛濺返回：

2017-01-03 12:12:37 [spider] DEBUG: Cookies: None

當我不使用飛濺此代碼的工作，並返回該網頁提供的餅乾。

飛濺的文檔顯示該代碼例如：

def parse_result(self, response): 
    # here response.body contains result HTML; 
    # response.headers are filled with headers from last 
    # web page loaded to Splash; 
    # cookies from all responses and from JavaScript are collected 
    # and put into Set-Cookie response header, so that Scrapy 
    # can remember them.

我不知道我是否正確地理解這一點，但我要說，我應該能夠訪問在同一餅乾就像我不使用Splash一樣。

中間件設置：

# Download middlewares 
DOWNLOADER_MIDDLEWARES = { 
    # Use a random user agent on each request 
    'crawling.middlewares.RandomUserAgentDownloaderMiddleware': 400, 
    'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None, 
    'scrapy.downloadermiddlewares.cookies.CookiesMiddleware': 700, 
    # Enable crawlera proxy 
    'scrapy_crawlera.CrawleraMiddleware': 600, 
    # Enable Splash to render javascript 
    'scrapy_splash.SplashCookiesMiddleware': 723, 
    'scrapy_splash.SplashMiddleware': 725, 
    'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810, 
}

所以我的問題是：我怎麼在使用飛濺請求訪問餅乾？

Settings.py

spider.py

來源

2017-01-03 Casper

您可以設置SPLASH_COOKIES_DEBUG=True選項來查看正在的所有cookies。當scrapy-splash配置正確時，當前cookiejar與所有cookie合併，可用爲response.cookiejar。

使用response.headers.get('Set-Header')不穩健，因爲在重定向的情況下（例如JS重定向）可能會有幾個響應，並且可以在第一個中設置cookie，而腳本僅返回最後一個響應的標頭。

我不確定這是否是您遇到的問題;該代碼不是Splash文檔的精確副本。這裏：

req = SplashRequest(
    url, 
    self.parse_page, 
    args={ 
     'wait': 0.5, 
     'lua_source': script 
    } 
)

您要發送請求到/render.json端點;它不執行Lua腳本;使用endpoint='execute'來解決這個問題。

來源

2017-01-04 13:41:03

我已將端點添加到請求但沒有結果。 response.headers.get（'Set-Cookie'）仍然返回一個NoneType。對於response.cookiejar，我得到一個錯誤：AttributeError：'SplashTextResponse'對象沒有屬性'cookiejar' – Casper

@Casper - 你確定所有描述的選項都設置在settings.py中嗎？ scrapy_splash.SplashCookiesMiddleware添加到'DOWNLOADER_MIDDLEWARES'嗎？ –

我用DOWNLOADER_MIDDLEWARES設置變量更新了這個問題。 – Casper

從Splash請求中讀取cookies

回答

相關問題