DEBUG: Retrying (failed 2 times): TCP connection timed out: 110: Connection timed out.
PS: 系統是Ubuntu的, 我能成功地做到這一點:爲什麼我scrapy總是告訴我 「TCP連接超時」
wget的http://www.dmoz.org/Computers/Programming/Languages/Python/Book/
蜘蛛代碼:
#!/usr/bin/python
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
class DmozSpider(BaseSpider):
name = "dmoz"
allowed_domains = ["dmoz.org"]
start_urls = ["http://www.dmoz.org/Computers/Programming/Languages/Python/Books/"]
def parse(self, response):
hxs = HtmlXPathSelector(response)
sites = hxs.select('//ul/li')
for site in sites:
title = site.select('a/text()').extract()
link = site.select('a/@href').extract()
desc = site.select('text()').extract()
print title, link, desc
您可以發佈您的蜘蛛的代碼,scrapy設置和控制檯輸出? –
你可以發佈你的設置嗎? –
您發佈的代碼是真實蜘蛛代碼的摘錄嗎?你的'start_urls'有第二個URL被剝離,或者你有一個語法錯誤。嘗試'start_urls = [「http://www.dmoz.org/Computers/Programming/Languages/Python/Books/」]' –