2017-04-16 51 views
0

這是我第一次使用scrapy和代理。當我測試我的代碼時,發生錯誤,但我找不到我的代碼錯在哪裏。scrapy:下載錯誤AND TypeError:to_bytes必須接收unicode,str或bytes對象,得到NoneType

pycharm告訴我,錯誤:下載https://movie.douban.com/subject/25754848/reviews>和TypeError:to_bytes錯誤必須接收unicode,str或bytes對象,得到NoneType。

這裏是中間件代碼。

import requests 
import lxml 
from bs4 import BeautifulSoup 
from scrapy import signals 

class ProxyMiddleware(object): 

    def process_request(self, request, spider): 
     url = 'http://127.0.0.1:5000/get' 
     r = requests.get(url) 
     request.meta['proxy'] = BeautifulSoup(r.text, "lxml").get_text() 

代碼評論:我有一個代理池。當它運行時,我可以從地址「http://127.0.0.1:5000/get

獲得不同的代理IP和端口,如「113.122.136.41:808」,這裏是錯誤和回溯列表。

2017-04-16 10:20:06 [scrapy.core.scraper] ERROR: Error downloading <GET    
https://movie.douban.com/subject/25754848/reviews> 
Traceback (most recent call last): 
    File "C:\Users\empra\AppData\Local\Programs\Python\Python36\lib\site-packages\twisted\internet\defer.py", line 1299, in _inlineCallbacks 
    result = result.throwExceptionIntoGenerator(g) 
    File "C:\Users\empra\AppData\Local\Programs\Python\Python36\lib\site-packages\twisted\python\failure.py", line 393, in throwExceptionIntoGenerator 
    return g.throw(self.type, self.value, self.tb) 
    File "C:\Users\empra\AppData\Local\Programs\Python\Python36\lib\site-packages\scrapy\core\downloader\middleware.py", line 43, in process_request 
    defer.returnValue((yield download_func(request=request,spider=spider))) 
    File "C:\Users\empra\AppData\Local\Programs\Python\Python36\lib\site-packages\scrapy\utils\defer.py", line 45, in mustbe_deferred 
    result = f(*args, **kw) 
    File "C:\Users\empra\AppData\Local\Programs\Python\Python36\lib\site-packages\scrapy\core\downloader\handlers\__init__.py", line 65, in download_request 
    return handler.download_request(request, spider) 
    File "C:\Users\empra\AppData\Local\Programs\Python\Python36\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 61, in download_request 
    return agent.download_request(request) 
    File "C:\Users\empra\AppData\Local\Programs\Python\Python36\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 260, in download_request 
    agent = self._get_agent(request, timeout) 
    File "C:\Users\empra\AppData\Local\Programs\Python\Python36\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 240, in _get_agent 
_, _, proxyHost, proxyPort, proxyParams = _parse(proxy) 
    File "C:\Users\empra\AppData\Local\Programs\Python\Python36\lib\site-packages\scrapy\core\downloader\webclient.py", line 37, in _parse 
    return _parsed_url_args(parsed) 
    File "C:\Users\empra\AppData\Local\Programs\Python\Python36\lib\site-packages\scrapy\core\downloader\webclient.py", line 20, in _parsed_url_args 
    host = b(parsed.hostname) 
    File "C:\Users\empra\AppData\Local\Programs\Python\Python36\lib\site-packages\scrapy\core\downloader\webclient.py", line 17, in <lambda> 
    b = lambda s: to_bytes(s, encoding='ascii') 
    File "C:\Users\empra\AppData\Local\Programs\Python\Python36\lib\site-packages\scrapy\utils\python.py", line 117, in to_bytes 
'object, got %s' % type(text).__name__) 
TypeError: to_bytes must receive a unicode, str or bytes object, got NoneType 

回答

0

我可以告訴你如何將來自url的流轉換爲unicode。

import requests 
import urllib2 
import lxml 
from bs4 import BeautifulSoup 
from scrapy import signals 

class ProxyMiddleware(object): 

    def process_request(self, request, spider): 
     url = 'http://127.0.0.1:5000/get' 
     r = requests.urlib2.open(url).read() 
     data=r.decode("utf-8") 
     request.meta['proxy'] = BeautifulSoup(data, "lxml").get_text() 
相關問題