0
我已經運行與谷歌的鏈接,提供你好的搜索結果,但有錯誤錯誤下載的任何URL在Python
代碼(蜘蛛代碼)
import scrapy
import re
class LinsSpider(scrapy.Spider):
name = "lins"
allowed_domains = ["www.google.com"]
start_urls = ('https://www.google.co.in/?gfe_rd=cr&ei=78uyWPjFH8WL8Qe7kKf4BA#q=hello&*',)
def parse(self, response):
pagestr = "[email protected]"
yield
{
'asin' : str(re.search("^[A-Za-z0-9\.\+_-][email protected][A-Za-z0-9\._-]+\.[a-zA-Z]*$",pagestr).group(1).strip()),
}
和錯誤是
簡單scrapy蜘蛛2017-02-26 18:06:11 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2017-02-26 18:06:11 [scrapy] ERROR: Error downloading <GET http://www.google.com/>
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/scrapy/utils/defer.py", line 45, in mustbe_deferred
result = f(*args, **kw)
File "/usr/lib/python2.7/dist-packages/scrapy/core/downloader/handlers/__init__.py", line 41, in download_request
return handler(request, spider)
File "/usr/lib/python2.7/dist-packages/scrapy/core/downloader/handlers/http11.py", line 44, in download_request
return agent.download_request(request)
File "/usr/lib/python2.7/dist-packages/scrapy/core/downloader/handlers/http11.py", line 211, in download_request
d = agent.request(method, url, headers, bodyproducer)
File "/usr/local/lib/python2.7/dist-packages/twisted/web/client.py", line 1631, in request
parsedURI.originForm)
File "/usr/local/lib/python2.7/dist-packages/twisted/web/client.py", line 1408, in _requestWithEndpoint
d = self._pool.getConnection(key, endpoint)
File "/usr/local/lib/python2.7/dist-packages/twisted/web/client.py", line 1294, in getConnection
return self._newConnection(key, endpoint)
File "/usr/local/lib/python2.7/dist-packages/twisted/web/client.py", line 1306, in _newConnection
return endpoint.connect(factory)
File "/usr/local/lib/python2.7/dist-packages/twisted/internet/endpoints.py", line 788, in connect
EndpointReceiver, self._hostText, portNumber=self._port
File "/usr/local/lib/python2.7/dist-packages/twisted/internet/_resolver.py", line 174, in resolveHostName
onAddress = self._simpleResolver.getHostByName(hostName)
File "/usr/lib/python2.7/dist-packages/scrapy/resolver.py", line 21, in getHostByName
d = super(CachingThreadedResolver, self).getHostByName(name, timeout)
File "/usr/local/lib/python2.7/dist-packages/twisted/internet/base.py", line 276, in getHostByName
timeoutDelay = sum(timeout)
TypeError: 'float' object is not iterable
2017-02-26 18:06:11 [scrapy] INFO: Closing spider (finished)
2017-02-26 18:06:11 [scrapy] INFO: Dumping Scrapy stats:
請幫我解決這個問題,我有Ubuntu的16.10
請填寫完整的代碼。我們無法運行您提供的代碼並獲得相同的結果。 –
我使用'startproject links'創建了Scrapy項目,並且使用'genspider lins'創建了spider,'lins.py'文件的代碼是我在我的問題中編寫的 –