2017-05-31 417 views
0

該網站確實有一個隱藏的身份驗證令牌,但docs似乎暗示我不需要在此覆蓋默認值,只需要傳遞用戶名和密碼即可。Scrapy登錄失敗

尋找在網絡選項卡,我注意到,除了發佈身份驗證令牌之外,還有許多cookie。不知道我是否必須在那裏做任何事情。

我的代碼,從不同的其他人的以前的嘗試鵝卵石:

The website does have a hidden authentication token, but the [docs][1] seem to suggest I don't need to override the default here, and only need to pass the username and password. 

尋找在網絡選項卡,我也注意到,除了張貼的認證令牌,也有無數的餅乾。不知道我是否必須在那裏做任何事情。

我的代碼,從不同的其他人的以前的嘗試鵝卵石:

import scrapy 
from scrapy.selector import Selector 
from scrapy import Spider 
from scrapy.contrib.spiders.init import InitSpider 
from scrapy.spider import BaseSpider 
from scrapy.http import Request, FormRequest 
from scrapy import log 
from scrapy.crawler import CrawlerProcess 

from dealinfo.items import DealinfoItem 

class DealinfoSpider(scrapy.Spider): 
    name = 'dealinfo' 
    allowed_domains = ['dealinfo.com'] 
    #login_page = 'https://dealinfo.com/users/sign_in' 
    start_urls = 'https://dealinfo.com/organizations/xxxx/member_landing' 

    def start_requests(self): 
     return [Request(url='https://dealinfo.com/users/sign_in', callback=self.login)] 

    def login(self, response): 
     return FormRequest(
          'https://dealinfo.com/users/sign_in', 
          formdata={ 
            'user[email]':'xxxxx', 
            'user[password]':'xxxxx' 
          }, 
          callback=self.after_login) 

    def after_login(self, response): 
     if "authentication failed": 
      self.log("Login failed", level=log.ERROR) 
      return 

     self.log('Login Successful. Parsing all other URLs') 
     for url in self.start_urls: 
      yield self.make_requests_from_url(url) 

    def parse(self, response): 
     deal_list = Selector(response).xpath('//table[@id="deal_list"]/tbody[@class="deal-list__row"]/tr[@class="deal"]') 

     for deal_row in deal_list: 
      item = DealinfoItem() 
      item['capital_seeking'] = deal_row.xpath('td[2]/text()').extract() 
      yield item` 
+0

你應該嘗試FormRequest.from_response() – Verz1Lka

回答

1

你缺少你的登錄請求一些FORMDATA:

formdata

你可以在頁面的源代碼發現authenticity_token登錄頁面:

token

+0

這似乎不是問題......因爲我試圖傳遞真實性標記。而scrapy文檔說,pne不必覆蓋除用戶和密碼以外的任何內容。 –

+0

我認爲這是必需的。尤其是如果身份驗證令牌是在外部創建的。無論如何試用,可能工作,可能不會。無論如何值得一試。 –