-1
我試圖使用Scrapy來下載我的Quora答案,但我似乎無法下載我的頁面。使用簡單如何使用Scrapy下載我所有的Quora答案?
scrapy shell 'http://it.quora.com/profile/Ferdinando-Randisi'
返回該錯誤
2017-10-05 22:16:52 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: quora)
2017-10-05 22:16:52 [scrapy.utils.log] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'quora.spiders', 'ROBOTSTXT_OBEY': True, 'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter', 'SPIDER_MODULES': \[quora.spiders'], 'BOT_NAME': 'quora', 'LOGSTATS_INTERVAL': 0}
....
2017-10-05 22:16:53 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2017-10-05 22:16:53 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2017-10-05 22:16:53 [scrapy.core.engine] INFO: Spider opened
2017-10-05 22:16:54 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://it.quora.com/robots.txt> from <GET http://it.quora.com/robots.txt>
2017-10-05 22:16:55 [scrapy.core.engine] DEBUG: Crawled (429) <GET https://it.quora.com/robots.txt> (referer: None)
2017-10-05 22:16:55 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://it.quora.com/profile/Ferdinando-Randisi> from <GET http://it.quora.com/profile/Ferdinando-Randisi>
2017-10-05 22:16:56 [scrapy.core.engine] DEBUG: Crawled (429) <GET https://it.quora.com/profile/Ferdinando-Randisi> (referer: None)
2017-10-05 22:16:58 [root] DEBUG: Using default logger
有什麼不對?錯誤429與太多請求相關聯,但我只提出一個請求。爲什麼這會太多?
閱讀['robots.txt'](https://www.quora.com/robots.txt)。 – tadman
我做過了,但沒有看到任何太相關的內容 - 他們只會寫關於如何使用搜索引擎讓他們知道的內容,並解釋他們爲什麼不喜歡人們下載每個人的內容。我沒有做這些事情,我只是想要我的答案。 –
試着用'捲曲'來看看會發生什麼。 – tadman