在恢復Scrapy工作時將Relogin重新掃描到網站

有沒有辦法讓Scrapy spider登錄到網站以恢復以前暫停的抓取工作？在恢復Scrapy工作時將Relogin重新掃描到網站

編輯：爲了澄清，我的問題是真的關於Scrapy蜘蛛，而不是一般的餅乾。也許一個更好的問題是，當Scrapy蜘蛛被凍結在工作目錄中之後，是否有任何方法被調用。

2012-05-09 kevin

是的，你可以。

你應該更清楚你的刮板的確切工作流程。

無論如何，我假設你第一次刮you時要登錄，並希望在恢復刮while時使用同一個cookie。

您可以使用httplib2庫來做這樣的事情。這裏是他們的examples page的代碼示例，爲了更加清晰，我添加了評論。

import urllib 
import httplib2 

http = httplib2.Http() 

url = 'http://www.example.com/login' 
body = {'USERNAME': 'foo', 'PASSWORD': 'bar'} 
headers = {'Content-type': 'application/x-www-form-urlencoded'} 

//submitting form data for logging into the website 
response, content = http.request(url, 'POST', headers=headers, body=urllib.urlencode(body)) 

//Now the 'response' object contains the cookie the website sends 
//which can be used for visiting the website again 

//setting the cookie for the new 'headers' 
headers_2 = {'Cookie': response['set-cookie']} 

url = 'http://www.example.com/home' 

// using the 'headers_2' object to visit the website, 
response, content = http.request(url, 'GET', headers=headers_2)

如果你是不會清除Cookie是如何工作的，做一個search。簡而言之，'Cookies'是一種幫助服務器維護會話的客戶端技術。

來源

2012-05-09 14:46:27 pcx

哎呀，我錯過了'Scrapy蜘蛛'的一部分。這將與一個簡單的抓取腳本有關。 – pcx

感謝您的提示superxor！如你所說，我的問題真的是關於Scrapy。我會編輯原件以清楚說明。 – kevin

在恢復Scrapy工作時將Relogin重新掃描到網站

回答

相關問題