無法運行Scrapy代碼

spiders.test.py代碼：

from scrapy.spider import BaseSpider 
    from scrapy.selector import HtmlXPathSelector 
    from wscraper.items import WscraperItem 


    class MySpider(BaseSpider): 
    name = "ExampleSpider" 
    allowed_domains = ["timeanddate.com"] 
    start_urls = ["https://www.timeanddate.com/worldclock/"] 

    def parse(self, response): 
     hxs = HtmlXPathSelector(response) 
     titles = hxs.select("/html/body/div[1]/div[8]/section[2]/div[1]/table/tbody").extract() 
    #for titles in titles: 
     #title = titles.select("a/text()").extract() 
     #link = titles.select("a/@href").extract() 
     print title

爲scraper.items的代碼是：從scrapy.item進口項目，現場

 class WscraperItem(Item): 
     # define the fields for your item here like: 
     # name = scrapy.Field() 
     title = Field() 
     pass

我得到上運行命令「scrapy爬行ExampleSpider」以下錯誤：

 [boto] ERROR: Caught exception reading instance data 
     Traceback (most recent call last): 
     File "/usr/lib/python2.7/dist-packages/boto/utils.py", line 210, in 
    retry_url 
    r = opener.open(req, timeout=timeout) 
    File "/usr/lib/python2.7/urllib2.py", line 429, in open 
    response = self._open(req, data) 
    File "/usr/lib/python2.7/urllib2.py", line 447, in _open 
    '_open', req) 
    File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain 
    result = func(*args) 
    File "/usr/lib/python2.7/urllib2.py", line 1228, in http_open 
    return self.do_open(httplib.HTTPConnection, req) 
    File "/usr/lib/python2.7/urllib2.py", line 1198, in do_open 
    raise URLError(err) 
    URLError: <urlopen error [Errno 101] Network is unreachable> 
    [boto] ERROR: Unable to read instance data, giving up 
    [scrapy] ERROR: Error downloading <GET 
    https://www.timeanddate.com/worldclock/> 
    Traceback (most recent call last): 
    File "/usr/lib/python2.7/dist-packages/scrapy/utils/defer.py", line 45, 
    in mustbe_deferred 
     result = f(*args, **kw) 
     File "/usr/lib/python2.7/dist- 
    packages/scrapy/core/downloader/handlers/__init__.py", line 41, in 
    download_request 
     return handler(request, spider) 
     File "/usr/lib/python2.7/dist- 
     packages/scrapy/core/downloader/handlers/http11.py", line 44, in 
     download_request 
     return agent.download_request(request) 
     d = super(CachingThreadedResolver, self).getHostByName(name, timeout) 
     File "/home/priyanka/.local/lib/python2.7/site- 
     packages/twisted/internet/base.py", line 276, in getHostByName 
     timeoutDelay = sum(timeout) 
     TypeError: 'float' object is not iterable 
     [scrapy] INFO: Dumping Scrapy stats: 
     {'downloader/exception_count': 1, 
     'downloader/exception_type_count/exceptions.TypeError': 1, 
     'downloader/request_bytes': 228, 
     'log_count/DEBUG': 2, 
     'log_count/ERROR': 3, 
     'log_count/INFO': 7, 
     'scheduler/dequeued': 1, 
     'scheduler/dequeued/memory': 1,

來源

2017-05-05 Priyanka

請問您可以在帖子中包含「WebScraperItem」代碼，最有可能的問題在那裏。 – JkShaw

蜘蛛的名字必須是str，不list，所以：

class ExampleSpider(BaseSpider): 
    name = "timeandzone"

否則Scrapy蜘蛛裝載機無法加載它。

來源

2017-05-05 07:52:29 mizhgun

基本上，我有一個兼容性問題。所以，我安裝了Scrapy1.3.3，這解決了問題，是的，如上面的答案中提到蜘蛛名應該是一個字符串。

來源

2017-05-07 03:41:59 Priyanka

無法運行Scrapy代碼

回答

相關問題