2017-03-18 115 views
-1

使用scrapy進行instagram登錄。 我使用FormRequest發佈用戶名和密碼。並啓用COOKIES_ENABLED = True使用scrapy進行instagram用戶登錄

我scrapy代碼:

import scrapy 
from scrapy.http import Request, FormRequest 
class InsSpider(scrapy.Spider): 
    name = 'InsVideo' 
    allowed_domains = ['instagram.com'] 

    url = 'https://www.instagram.com/' 
    url_login = 'https://www.instagram.com/accounts/login/ajax/' 

    def start_requests(self): 
     return [Request(self.url_login, callback=self.login)] 
    def login(self, response): 
     login_post = {'username': 'username', 
         'password': 'password'} 
     return [FormRequest.from_response(response, 
              formdata=login_post, 
              # callback=self.start_requests, 
              dont_filter=True 
             )] 

我運行scrapy crawl InsVideo,並返回錯誤信息:

2017-03-18 12:15:49 [scrapy.core.engine] INFO: Spider opened 
2017-03-18 12:15:49 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 
2017-03-18 12:15:49 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023 
2017-03-18 12:15:52 [scrapy.downloadermiddlewares.cookies] DEBUG: Received cookies from: <200 https://www.instagram.com/robots.txt> 
Set-Cookie: mid=WMy0dwALAAGACJPXOYvoxHfHO00m; expires=Fri, 13-Mar-2037 04:15:51 GMT; Max-Age=630720000; Path=/ 

Set-Cookie: csrftoken=5JrsDnF569QLnmIzg4h0VOBRJ8gHQZZi; expires=Sat, 17-Mar-2018 04:15:51 GMT; Max-Age=31449600; Path=/; Secure 

2017-03-18 12:15:52 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.instagram.com/robots.txt> (referer: None) 
2017-03-18 12:15:52 [scrapy.downloadermiddlewares.cookies] DEBUG: Sending cookies to: <GET https://www.instagram.com/accounts/login/ajax/> 
Cookie: mid=WMy0dwALAAGACJPXOYvoxHfHO00m; csrftoken=5JrsDnF569QLnmIzg4h0VOBRJ8gHQZZi 

2017-03-18 12:15:52 [scrapy.downloadermiddlewares.cookies] DEBUG: Received cookies from: <405 https://www.instagram.com/accounts/login/ajax/> 
Set-Cookie: csrftoken=5JrsDnF569QLnmIzg4h0VOBRJ8gHQZZi; expires=Sat, 17-Mar-2018 04:15:52 GMT; Max-Age=31449600; Path=/; Secure 

2017-03-18 12:15:52 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://www.instagram.com/accounts/login/ajax/> (referer: None) 
2017-03-18 12:15:52 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://www.instagram.com/accounts/login/ajax/>: HTTP status code is not handled or not allowed 
2017-03-18 12:15:52 [scrapy.core.engine] INFO: Closing spider (finished) 

我不知道什麼是錯的代碼。謝謝

回答

0

您的url_login有誤,應該是https://www.instagram.com/accounts/login/

無論如何,Istagram登錄頁面通過JavaScript生成登錄表單。您可以通過瀏覽器的「查看頁面源代碼」功能看到:在生成的HTML代碼中,沒有<form>標籤。這正是Scrapy所看到的。您必須使用系統來運行JavaScript代碼,也許是無頭瀏覽器。

更正的句子

+1

嗨,現在我可以用兩種方法登錄instagram。使用cookie設置與scrapy。並使用頭和cookie的請求庫。但FormRequest是沒有必要的。謝謝你的回答。 –