0
我寫了下面的代碼:無法運行Scrapy代碼
spiders.test.py代碼:
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from wscraper.items import WscraperItem
class MySpider(BaseSpider):
name = "ExampleSpider"
allowed_domains = ["timeanddate.com"]
start_urls = ["https://www.timeanddate.com/worldclock/"]
def parse(self, response):
hxs = HtmlXPathSelector(response)
titles = hxs.select("/html/body/div[1]/div[8]/section[2]/div[1]/table/tbody").extract()
#for titles in titles:
#title = titles.select("a/text()").extract()
#link = titles.select("a/@href").extract()
print title
爲scraper.items的代碼是:從scrapy.item進口項目 ,現場
class WscraperItem(Item):
# define the fields for your item here like:
# name = scrapy.Field()
title = Field()
pass
我得到上運行命令 「scrapy爬行ExampleSpider」 以下錯誤:
[boto] ERROR: Caught exception reading instance data
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/boto/utils.py", line 210, in
retry_url
r = opener.open(req, timeout=timeout)
File "/usr/lib/python2.7/urllib2.py", line 429, in open
response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 447, in _open
'_open', req)
File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1228, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib/python2.7/urllib2.py", line 1198, in do_open
raise URLError(err)
URLError: <urlopen error [Errno 101] Network is unreachable>
[boto] ERROR: Unable to read instance data, giving up
[scrapy] ERROR: Error downloading <GET
https://www.timeanddate.com/worldclock/>
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/scrapy/utils/defer.py", line 45,
in mustbe_deferred
result = f(*args, **kw)
File "/usr/lib/python2.7/dist-
packages/scrapy/core/downloader/handlers/__init__.py", line 41, in
download_request
return handler(request, spider)
File "/usr/lib/python2.7/dist-
packages/scrapy/core/downloader/handlers/http11.py", line 44, in
download_request
return agent.download_request(request)
d = super(CachingThreadedResolver, self).getHostByName(name, timeout)
File "/home/priyanka/.local/lib/python2.7/site-
packages/twisted/internet/base.py", line 276, in getHostByName
timeoutDelay = sum(timeout)
TypeError: 'float' object is not iterable
[scrapy] INFO: Dumping Scrapy stats:
{'downloader/exception_count': 1,
'downloader/exception_type_count/exceptions.TypeError': 1,
'downloader/request_bytes': 228,
'log_count/DEBUG': 2,
'log_count/ERROR': 3,
'log_count/INFO': 7,
'scheduler/dequeued': 1,
'scheduler/dequeued/memory': 1,
請問您可以在帖子中包含「WebScraperItem」代碼,最有可能的問題在那裏。 – JkShaw