我是scrapy的新手,我做了scrapy項目來取消數據。如何解決scrapy中的403錯誤
我試圖從網站scrapy的數據,但我得到下面的錯誤日誌
2016-08-29 14:07:57 [scrapy] INFO: Enabled item pipelines:
[]
2016-08-29 13:55:03 [scrapy] INFO: Spider opened
2016-08-29 13:55:03 [scrapy] INFO: Crawled 0 pages (at 0 pages/min),scraped 0 items (at 0 items/min)
2016-08-29 13:55:04 [scrapy] DEBUG: Crawled (403) <GET http://www.justdial.com/robots.txt> (referer: None)
2016-08-29 13:55:04 [scrapy] DEBUG: Crawled (403) <GET http://www.justdial.com/Mumbai/small-business> (referer: None)
2016-08-29 13:55:04 [scrapy] DEBUG: Ignoring response <403 http://www.justdial.com/Mumbai/small-business>: HTTP status code is not handled or not allowed
2016-08-29 13:55:04 [scrapy] INFO: Closing spider (finished)
我想下面的命令,然後網站控制檯上,然後我得到的迴應,但是當我在Python腳本中使用相同的路徑,然後我得到了我上面描述的錯誤。
命令Web控制檯上:
$x('//div[@class="col-sm-5 col-xs-8 store-details sp-detail paddingR0"]/h4/span/a/text()')
$x('//div[@class="col-sm-5 col-xs-8 store-details sp-detail paddingR0"]/p[@class="contact-info"]/span/a/text()')
請幫助我。在評論中提及
感謝
也許你需要爲'scrapy'添加頭文件,以便它像瀏覽器一樣工作? –
嗨Avihoo,你能讓我怎麼添加標題和添加它在哪裏。謝謝 – JT28
你需要添加這一行到你的'request':'request.headers = Headers({'User-Agent':'Mozilla/5.0(X11; Linux x86_64)AppleWebKit/537.36(KHTML,像Gecko)Chrome/51.0.2687.0 Safari/537.36'})' –