2017-04-20 65 views
1

我是scrapy的新手,我遵循教程,但無法使其工作。每一步都與指導相同。我不知道有什麼問題? i將如何使scrapy工作

ROBOTSTXT_OBEY = True 
DOWNLOAD_DELAY = 3 
HTTPCACHE_ENABLED = True 
HTTPCACHE_EXPIRATION_SECS = 0 
HTTPCACHE_DIR = 'httpcache' 
HTTPCACHE_IGNORE_HTTP_CODES = [] 
HTTPCACHE_STORAGE = 'scrapy.extensions.httpca 

和蜘蛛如下寫爲:

import re 
import scrapy 
from bs4 import BeautifulSoup 
from scrapy.http import Request 
from adddelay.items import AdddelayItem 


class Myspider(scrapy.Spider): 
name = 'adddelay' 
allowed_domains = ['23us.com'] 
bash_url = 'http://www.23us.com//class/' 
bashurl = '.html' 

def start_requests(self): 
    for i in range(1, 11): 
     url = self.bash_url + str(i)+'_1' + self.bashurl 
     yield Request(url, self.parse) 
    yield Request('http://www.23us.com/quanben/1', self.parse) 

def parse(self, response): 
    print(response.text) 

和repreat錯誤「類型錯誤:‘浮動’對象不是可迭代」;輸出的一部分作爲

2017-04-20 16:16:58 [scrapy] INFO: Scrapy 1.1.1 started (bot: adddelay) 
2017-04-20 16:16:58 [scrapy] INFO: Overridden settings: {'SPIDER_MODULES': 
['adddelay.spiders'], 'BOT_NAME': 'adddelay', 'NEWSPIDER_MODULE': 
'adddelay.spiders', 'DOWNLOAD_DELAY': 3, 'HTTPCACHE_ENABLED': True} 
2017-04-20 16:16:58 [scrapy] INFO: Enabled extensions: 
['scrapy.extensions.logstats.LogStats', 
'scrapy.extensions.telnet.TelnetConsole', 
'scrapy.extensions.corestats.CoreStats'] 
2017-04-20 16:16:59 [scrapy] INFO: Enabled downloader middlewares: 
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 
'scrapy.downloadermiddlewares.retry.RetryMiddleware', 
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 
'scrapy.downloadermiddlewares.chunked.ChunkedTransferMiddleware', 
'scrapy.downloadermiddlewares.stats.DownloaderStats', 
'scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware'] 
2017-04-20 16:16:59 [scrapy] INFO: Enabled spider middlewares: 
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 
'scrapy.spidermiddlewares.referer.RefererMiddleware', 
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 
'scrapy.spidermiddlewares.depth.DepthMiddleware'] 
2017-04-20 16:16:59 [scrapy] INFO: Enabled item pipelines: 
[] 
2017-04-20 16:16:59 [scrapy] INFO: Spider opened 
2017-04-20 16:16:59 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 
2017-04-20 16:16:59 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023 
2017-04-20 16:16:59 [scrapy] ERROR: Error downloading <GET http://www.23us.com//class/1_1.html> 
Traceback (most recent call last): 
File "D:\Anaconda3\lib\site-packages\twisted\internet\defer.py", line 1299, in _inlineCallbacks 
result = result.throwExceptionIntoGenerator(g) 
File "D:\Anaconda3\lib\site-packages\twisted\python\failure.py", line 393, in throwExceptionIntoGenerator 
return g.throw(self.type, self.value, self.tb) 
File "D:\Anaconda3\lib\site-packages\scrapy\core\downloader\middleware.py", line 43, in process_request 
defer.returnValue((yield download_func(request=request,spider=spider))) 
File "D:\Anaconda3\lib\site-packages\scrapy\utils\defer.py", line 45, in mustbe_deferred 
result = f(*args, **kw) 
File "D:\Anaconda3\lib\site-packages\scrapy\core\downloader\handlers\__init__.py", line 65, in download_request 
return handler.download_request(request, spider) 
File "D:\Anaconda3\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 60, in download_request 
return agent.download_request(request) 
File "D:\Anaconda3\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 285, in download_request 
method, to_bytes(url, encoding='ascii'), headers, bodyproducer) 
File "D:\Anaconda3\lib\site-packages\twisted\web\client.py", line 1631, in request 
parsedURI.originForm) 
File "D:\Anaconda3\lib\site-packages\twisted\web\client.py", line 1408, in _requestWithEndpoint 
d = self._pool.getConnection(key, endpoint) 
File "D:\Anaconda3\lib\site-packages\twisted\web\client.py", line 1294, in getConnection 
return self._newConnection(key, endpoint) 
File "D:\Anaconda3\lib\site-packages\twisted\web\client.py", line 1306, in _newConnection 
return endpoint.connect(factory) 
File "D:\Anaconda3\lib\site-packages\twisted\internet\endpoints.py", line 788, in connect 
EndpointReceiver, self._hostText, portNumber=self._port 
File "D:\Anaconda3\lib\site-packages\twisted\internet\_resolver.py", line 174, in resolveHostName 
onAddress = self._simpleResolver.getHostByName(hostName) 
File "D:\Anaconda3\lib\site-packages\scrapy\resolver.py", line 21, in getHostByName 
d = super(CachingThreadedResolver, self).getHostByName(name, timeout) 
File "D:\Anaconda3\lib\site-packages\twisted\internet\base.py", line 276, in getHostByName 
timeoutDelay = sum(timeout) 
TypeError: 'float' object is not iterable 
2017-04-20 16:17:03 [scrapy] ERROR: Error downloading <GET http://www.23us.com//class/2_1.html> 
Traceback (most recent call last): 
File "D:\Anaconda3\lib\site-packages\twisted\internet\defer.py", line 1299, in _inlineCallbacks 
result = result.throwExceptionIntoGenerator(g) 
File "D:\Anaconda3\lib\site-packages\twisted\python\failure.py", line 393, in throwExceptionIntoGenerator 
return g.throw(self.type, self.value, self.tb) 
File "D:\Anaconda3\lib\site-packages\scrapy\core\downloader\middleware.py", line 43, in process_request 
defer.returnValue((yield download_func(request=request,spider=spider))) 
File "D:\Anaconda3\lib\site-packages\scrapy\utils\defer.py", line 45, in mustbe_deferred 
result = f(*args, **kw) 
File "D:\Anaconda3\lib\site-packages\scrapy\core\downloader\handlers\__init__.py", line 65, in download_request 
return handler.download_request(request, spider) 
File "D:\Anaconda3\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 60, in download_request 
return agent.download_request(request) 
File "D:\Anaconda3\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 285, in download_request 
method, to_bytes(url, encoding='ascii'), headers, bodyproducer) 
File "D:\Anaconda3\lib\site-packages\twisted\web\client.py", line 1631, in request 
parsedURI.originForm) 
File "D:\Anaconda3\lib\site-packages\twisted\web\client.py", line 1408, in _requestWithEndpoint 
d = self._pool.getConnection(key, endpoint) 
File "D:\Anaconda3\lib\site-packages\twisted\web\client.py", line 1294, in getConnection 
return self._newConnection(key, endpoint) 
File "D:\Anaconda3\lib\site-packages\twisted\web\client.py", line 1306, in _newConnection 
return endpoint.connect(factory) 
File "D:\Anaconda3\lib\site-packages\twisted\internet\endpoints.py", line 788, in connect 
EndpointReceiver, self._hostText, portNumber=self._port 
File "D:\Anaconda3\lib\site-packages\twisted\internet\_resolver.py", line 174, in resolveHostName 
onAddress = self._simpleResolver.getHostByName(hostName) 
File "D:\Anaconda3\lib\site-packages\scrapy\resolver.py", line 21, in getHostByName 
d = super(CachingThreadedResolver, self).getHostByName(name, timeout) 
File "D:\Anaconda3\lib\site-packages\twisted\internet\base.py", line 276, in getHostByName 
timeoutDelay = sum(timeout) 
TypeError: 'float' object is not iterable 
2017-04-20 16:17:08 [scrapy] ERROR: Error downloading <GET http://www.23us.com//class/3_1.html> 
Traceback (most recent call last): 
File "D:\Anaconda3\lib\site-packages\twisted\internet\defer.py", line 1299, in _inlineCallbacks 
result = result.throwExceptionIntoGenerator(g) 
File "D:\Anaconda3\lib\site-packages\twisted\python\failure.py", line 393, in throwExceptionIntoGenerator 
return g.throw(self.type, self.value, self.tb) 
File "D:\Anaconda3\lib\site-packages\scrapy\core\downloader\middleware.py", line 43, in process_request 
defer.returnValue((yield download_func(request=request,spider=spider))) 
File "D:\Anaconda3\lib\site-packages\scrapy\utils\defer.py", line 45, in mustbe_deferred 
result = f(*args, **kw) 
File "D:\Anaconda3\lib\site-packages\scrapy\core\downloader\handlers\__init__.py", line 65, in download_request 
return handler.download_request(request, spider) 
File "D:\Anaconda3\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 60, in download_request 
return agent.download_request(request) 
File "D:\Anaconda3\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 285, in download_request 
method, to_bytes(url, encoding='ascii'), headers, bodyproducer) 
File "D:\Anaconda3\lib\site-packages\twisted\web\client.py", line 1631, in request 
parsedURI.originForm) 
File "D:\Anaconda3\lib\site-packages\twisted\web\client.py", line 1408, in _requestWithEndpoint 
d = self._pool.getConnection(key, endpoint) 
File "D:\Anaconda3\lib\site-packages\twisted\web\client.py", line 1294, in getConnection 
return self._newConnection(key, endpoint) 
File "D:\Anaconda3\lib\site-packages\twisted\web\client.py", line 1306, in _newConnection 
return endpoint.connect(factory) 
File "D:\Anaconda3\lib\site-packages\twisted\internet\endpoints.py", line 788, in connect 
EndpointReceiver, self._hostText, portNumber=self._port 
File "D:\Anaconda3\lib\site-packages\twisted\internet\_resolver.py", line 174, in resolveHostName 
onAddress = self._simpleResolver.getHostByName(hostName) 
File "D:\Anaconda3\lib\site-packages\scrapy\resolver.py", line 21, in getHostByName 
d = super(CachingThreadedResolver, self).getHostByName(name, timeout) 
File "D:\Anaconda3\lib\site-packages\twisted\internet\base.py", line 276, in getHostByName 
timeoutDelay = sum(timeout) 
TypeError: 'float' object is not iterable 
2017-04-20 16:17:12 [scrapy] ERROR: Error downloading <GET http://www.23us.com//class/4_1.html> 
Traceback (most recent call last): 
File "D:\Anaconda3\lib\site-packages\twisted\internet\defer.py", line 1299, in _inlineCallbacks 
result = result.throwExceptionIntoGenerator(g) 
File "D:\Anaconda3\lib\site-packages\twisted\python\failure.py", line 393, in throwExceptionIntoGenerator 
return g.throw(self.type, self.value, self.tb) 
File "D:\Anaconda3\lib\site-packages\scrapy\core\downloader\middleware.py", line 43, in process_request 
defer.returnValue((yield download_func(request=request,spider=spider))) 
File "D:\Anaconda3\lib\site-packages\scrapy\utils\defer.py", line 45, in mustbe_deferred 
result = f(*args, **kw) 
File "D:\Anaconda3\lib\site-packages\scrapy\core\downloader\handlers\__init__.py", line 65, in download_request 
return handler.download_request(request, spider) 
File "D:\Anaconda3\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 60, in download_request 
return agent.download_request(request) 
File "D:\Anaconda3\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 285, in download_request 
method, to_bytes(url, encoding='ascii'), headers, bodyproducer) 

我忘記了沒有入口點,無法在pycharm中調試scrapy;代碼應該放在scrapy的根目錄下

from scrapy.cmdline import execute 
execute(['scrapy', 'crawl', 'adddelay']) 

我已經解決了這個問題。

回答

0

我弄清楚這個問題,控制檯提出這個問題,因爲我的scrpay的版本是1.0.x和twisted'version是17.1.1。通過安裝1.3.3的scrapy可以解決這個問題。 1.3.3的Scrapy適用於17.1.1的扭曲。