嗨,我再次在C10計劃,並試圖刮亞馬遜網站;爬過(200),但沒有被刮 - Crawlera
我有這個問題,有時日誌說一個網站被抓取,但它不會刮我想要的數據,並按照我的指示跳到下一頁。從一些頁面它會刮一些它不會我不明白。就像我檢查代碼和網址的html一樣,還有一些項目需要在爬行的網站上抓取,但沒有被刮掉。任何人都可以幫助我理解最新的情況嗎?我在想,也許網站返回一個驗證碼,但即使如此,我認爲crawlera會自動重試它獲取驗證碼的請求。
下面是日誌:
'time': '2017-02-12',
'title': u'Basic GIS Coordinates, Second Edition',
'url': u'https://www.amazon.com/Basic-GIS-Coordinates-Second-Sickle/dp/1420092316/ref=sr_1_64?s=tradein-aps&srs=9187220011&ie=UTF8&qid=1486932384&sr=1-64'}
2017-02-12 14:46:31 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.amazon.com/s//s/ref=sr_nr_n_3/153-6246827-9833634?srs=9187220011&fst=as%3Aoff&rh=n%3A283155%2Cn%3A%211000%2Cn%3A173507%2Cn%3A173515%2Cn%3A227541%2Cn%3A13735&bbn=227541&ie=UTF8&qid=1486860051&rnid=227541> (referer: None)
2017-02-12 14:46:42 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.amazon.com/s//s/ref=sr_nr_n_2/153-6246827-9833634?srs=9187220011&fst=as%3Aoff&rh=n%3A283155%2Cn%3A%211000%2Cn%3A173507%2Cn%3A173515%2Cn%3A227541%2Cn%3A52187011&bbn=227541&ie=UTF8&qid=1486860051&rnid=227541> (referer: None)
2017-02-12 14:46:44 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.amazon.com/s/ref=sr_pg_2/153-6246827-9833634?bbn=227541&fst=as%3Aoff&ie=UTF8&page=2&qid=1486932385&rh=n%3A283155%2Cn%3A%211000%2Cn%3A173507%2Cn%3A173515%2Cn%3A227541%2Cn%3A13735&srs=9187220011> (referer: https://www.amazon.com/s//s/ref=sr_nr_n_3/153-6246827-9833634?srs=9187220011&fst=as%3Aoff&rh=n%3A283155%2Cn%3A%211000%2Cn%3A173507%2Cn%3A173515%2Cn%3A227541%2Cn%3A13735&bbn=227541&ie=UTF8&qid=1486860051&rnid=227541)
2017-02-12 14:46:44 [scrapy.log] DEBUG: successfully added!
2017-02-12 14:46:44 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.amazon.com/s/ref=sr_pg_2/153-6246827-9833634?bbn=227541&fst=as%3Aoff&ie=UTF8&page=2&qid=1486932385&rh=n%3A283155%2Cn%3A%211000%2Cn%3A173507%2Cn%3A173515%2Cn%3A227541%2Cn%3A13735&srs=9187220011>
{'currency': u'$',
因爲你有一個crawlera計劃,我會建議在[他們的支持頁面](https://support.scrapinghub.com)詢問直接幫助 – eLRuLL