Scrapy Splash - 保持登錄狀態

我在使用scrapy + splash的網站上進行連接時非常麻煩（感謝this thread）。Scrapy Splash - 保持登錄狀態

我知道我已登錄，因爲我可以在登錄後顯示一些可用的元素。但只要我嘗試與另一個網頁聯繫另一個網頁SplashRequest，網站就會要求再次登錄。

所以似乎scrapy（或飛濺）不會保持會話激活。有什麼要啓用，以保持記錄，並保持會話激活？

謝謝

2017-07-26 Robin Fourcade

飛濺開始從每一個乾淨的狀態呈現，因此，如果你想保持會話，你需要先初始化餅乾，也使Scrapy意識到在呈現上設置的Cookie。請參閱scrapy-splash自述文件中的Session Handling部分。一個完整的例子看起來是這樣的（自述複製粘貼）：

import scrapy 
from scrapy_splash import SplashRequest 

script = """ 
function main(splash) 
    splash:init_cookies(splash.args.cookies) 
    assert(splash:go{ 
    splash.args.url, 
    headers=splash.args.headers, 
    http_method=splash.args.http_method, 
    body=splash.args.body, 
    }) 
    assert(splash:wait(0.5)) 

    local entries = splash:history() 
    local last_response = entries[#entries].response 
    return { 
    url = splash:url(), 
    headers = last_response.headers, 
    http_status = last_response.status, 
    cookies = splash:get_cookies(), 
    html = splash:html(), 
    } 
end 
""" 

class MySpider(scrapy.Spider): 


    # ... 
     yield SplashRequest(url, self.parse_result, 
      endpoint='execute', 
      cache_args=['lua_source'], 
      args={'lua_source': script}, 
     ) 

    def parse_result(self, response): 
     # here response.body contains result HTML; 
     # response.headers are filled with headers from last 
     # web page loaded to Splash; 
     # cookies from all responses and from JavaScript are collected 
     # and put into Set-Cookie response header, so that Scrapy 
     # can remember them.

注意，當前會話使用需要/執行或/運行端點，有沒有其他的端點幫手。

來源

2017-07-26 21:31:05

Scrapy Splash - 保持登錄狀態

回答

相關問題