0
我是Scrapy框架的新手,並試圖使用Spider抓取網站。在我的網站上,當我從頁面1 - >頁面2導航時,中間頁面添加了Meta Refresh,它將頁面重定向到頁面2.但是,我經常在重定向時收到錯誤302。我嘗試下面的事情Meta刷新方面的問題關於Scrapy
設置用戶代理 「的Mozilla/5.0(Windows NT的6.1)爲AppleWebKit/537.36(KHTML,例如Gecko)Chrome瀏覽器/ Safari瀏覽器56.0.2924.87/537.36」
設置DOWNLOAD_DELAY = 15
設置REDIRECT_MAX_METAREFRESH_DELAY = 100
但是我沒有成功。我是Scrapy的新手。如果有人幫助我提供如何解決這個問題的方向,我將不勝感激。
添加日誌爲每個請求
2017-02-17 21:02:43 [scrapy.core.engine] INFO: Spider opened
2017-02-17 21:02:43 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pag
es/min), scraped 0 items (at 0 items/min)
2017-02-17 21:02:43 [scrapy.extensions.telnet] DEBUG: Telnet console listening o
n 127.0.0.1:6023
2017-02-17 21:02:44 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://xxxx.website.com/search-cases.htm> (referer: None)
2017-02-17 21:02:44 [quotes] INFO: http://www.xxxx.website2.com/e
services/home.page
2017-02-17 21:02:46 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (
meta refresh) to <GET http://www.xxxx.website2.com/eservices/;jsessionid=D
724B51CE14CFB9A06AB5A1C2BADC7BA?x=pQSPWmZkMdOltOc6jey5Pzm2g*gqQrsim1X*85dDjm1K*V
wIS*xP-fdT9lRZBHHOA41kK1OaAco2dC8Un6N*uJtWnK50mGmm> from <GET http://www.courtre
cords.alaska.gov/eservices/home.page>
2017-02-17 21:02:55 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (
302) to <GET http://www.xxxx.website2.com/eservices/home.page> from <GET h
ttp://www.xxxx.website2.com/eservices/;jsessionid=D724B51CE14CFB9A06AB5A1C
2BADC7BA?x=pQSPWmZkMdOltOc6jey5Pzm2g*gqQrsim1X*85dDjm1K*VwIS*xP-fdT9lRZBHHOA41kK
1OaAco2dC8Un6N*uJtWnK50mGmm>
2017-02-17 21:02:55 [scrapy.dupefilters] DEBUG: Filtered duplicate request: <GET
http://www.xxxx.website2.com/eservices/home.page> - no more duplicates wi
ll be shown (see DUPEFILTER_DEBUG to show all duplicates)
2017-02-17 21:02:55 [scrapy.core.engine] INFO: Closing spider (finished)
**請注意,我已經改變網站名稱**
份額錯誤日誌請不要因爲碰到刷新的 – eLRuLL
它......請發表您的scrapy日誌,以便我們能夠幫助 – Umair
@eLRuLL我有共同的日誌並更改了實際的網站名稱。 –