在scrapy中使用loginform

scrapy框架（https://github.com/scrapy/scrapy）提供了一個庫，供登錄需要驗證的網站時使用，https://github.com/scrapy/loginform。
我已經瀏覽了這兩個程序的文檔，但我似乎無法弄清楚如何讓scrapy在運行之前調用loginform。登錄工作正常，只有loginform。
謝謝在scrapy中使用loginform

來源

2015-04-22 ollierexx

您有沒有機會嘗試我提供的解決方案？ – elias

我無法工作，但是我發佈了一個解決方案。 – ollierexx

loginform只是一個庫，完全與Scrapy分離。

您必須編寫代碼將其插入所需的蜘蛛中，可能是使用回調方法。

下面是做到這一點的結構的一個例子：

import scrapy 
from loginform import fill_login_form 

class MySpiderWithLogin(scrapy.Spider): 
    name = 'my-spider' 

    start_urls = [ 
     'http://somewebsite.com/some-login-protected-page', 
     'http://somewebsite.com/another-protected-page', 
    ] 

    login_url = 'http://somewebsite.com/login-page' 

    login_user = 'your-username' 
    login_password = 'secret-password-here' 

    def start_requests(self): 
     # let's start by sending a first request to login page 
     yield scrapy.Request(self.login_url, self.parse_login) 

    def parse_login(self, response): 
     # got the login page, let's fill the login form... 
     data, url, method = fill_login_form(response.url, response.body, 
              self.login_user, self.login_password) 

     # ... and send a request with our login data 
     return scrapy.FormRequest(url, formdata=dict(data), 
          method=method, callback=self.start_crawl) 

    def start_crawl(self, response): 
     # OK, we're in, let's start crawling the protected pages 
     for url in self.start_urls: 
      yield scrapy.Request(url) 

    def parse(self, response): 
     # do stuff with the logged in response

來源

2015-04-22 22:19:33 elias

第一個完整的例子，它的作品！ –

我設法得到它的工作沒有登錄表單庫，我的解決方案如下。

import scrapy 
import requests 

class Spider(scrapy.Spider): 
    name = 'spider' 

    start_urls = [ 
     'http://start.com', 
    ] 

    def start_requests(self): 
     return [scrapy.FormRequest("login.php", 
           formdata={'username': 'user', 'password': 'pass'}, 
           callback=self.start_crawl)] 

    def start_crawl(self, response): 
     #start crawling

來源

2015-05-02 16:38:29 ollierexx

我很高興你設法讓它工作。不過，該解決方案似乎並沒有很好地與問題聯繫起來。這只是從POST請求開始抓取（不是真正的loginform範圍，即找到發送登錄POST請求的表單）。 – elias

在scrapy中使用loginform

回答

相關問題