2017-08-06 46 views
1

我想使用協程來抓取和解析網頁。我寫了一個樣本和測試。該程序可以在Ubuntu 16.04的python 3.5中運行良好,當所有作品完成後它將退出。源代碼如下。爲什麼BeautifulSoup與'任務異常從未檢索'相關?

import aiohttp 
import asyncio 
from bs4 import BeautifulSoup 

async def coro(): 
    coro_loop = asyncio.get_event_loop() 
    url = u'https://www.python.org/' 
    for _ in range(4): 
     async with aiohttp.ClientSession(loop=coro_loop) as coro_session: 
      with aiohttp.Timeout(30, loop=coro_session.loop): 
       async with coro_session.get(url) as resp: 
        print('get response from url: %s' % url) 
        source_code = await resp.read() 
        soup = BeautifulSoup(source_code, 'lxml') 

def main(): 
    loop = asyncio.get_event_loop() 
    worker = loop.create_task(coro()) 
    try: 
     loop.run_until_complete(worker) 
    except KeyboardInterrupt: 
     print ('keyboard interrupt') 
     worker.cancel() 
    finally: 
     loop.stop() 
     loop.run_forever() 
     loop.close() 

if __name__ == '__main__': 
    main() 

測試時,我發現,當我通過按「Ctrl + C」關閉程序時,會出現一個錯誤「任務異常從來沒有檢索到的」。

^Ckeyboard interrupt 
Task exception was never retrieved 
future: <Task finished coro=<coro() done, defined at ./test.py:8> exception=KeyboardInterrupt()> 
Traceback (most recent call last): 
    File "./test.py", line 23, in main 
    loop.run_until_complete(worker) 
    File "/usr/lib/python3.5/asyncio/base_events.py", line 375, in run_until_complete 
    self.run_forever() 
    File "/usr/lib/python3.5/asyncio/base_events.py", line 345, in run_forever 
    self._run_once() 
    File "/usr/lib/python3.5/asyncio/base_events.py", line 1312, in _run_once 
    handle._run() 
    File "/usr/lib/python3.5/asyncio/events.py", line 125, in _run 
    self._callback(*self._args) 
    File "/usr/lib/python3.5/asyncio/tasks.py", line 307, in _wakeup 
    self._step() 
    File "/usr/lib/python3.5/asyncio/tasks.py", line 239, in _step 
    result = coro.send(None) 
    File "./test.py", line 17, in coro 
    soup = BeautifulSoup(source_code, 'lxml') 
    File "/usr/lib/python3/dist-packages/bs4/__init__.py", line 215, in __init__ 
    self._feed() 
    File "/usr/lib/python3/dist-packages/bs4/__init__.py", line 239, in _feed 
    self.builder.feed(self.markup) 
    File "/usr/lib/python3/dist-packages/bs4/builder/_lxml.py", line 240, in feed 
    self.parser.feed(markup) 
    File "src/lxml/parser.pxi", line 1194, in lxml.etree._FeedParser.feed (src/lxml/lxml.etree.c:119773) 
    File "src/lxml/parser.pxi", line 1316, in lxml.etree._FeedParser.feed (src/lxml/lxml.etree.c:119644) 
    File "src/lxml/parsertarget.pxi", line 141, in lxml.etree._TargetParserContext._handleParseResult (src/lxml/lxml.etree.c:137264) 
    File "src/lxml/parsertarget.pxi", line 135, in lxml.etree._TargetParserContext._handleParseResult (src/lxml/lxml.etree.c:137128) 
    File "src/lxml/lxml.etree.pyx", line 324, in lxml.etree._ExceptionContext._raise_if_stored (src/lxml/lxml.etree.c:11090) 
    File "src/lxml/saxparser.pxi", line 499, in lxml.etree._handleSaxData (src/lxml/lxml.etree.c:131013) 
    File "src/lxml/parsertarget.pxi", line 88, in lxml.etree._PythonSaxParserTarget._handleSaxData (src/lxml/lxml.etree.c:136397) 
    File "/usr/lib/python3/dist-packages/bs4/builder/_lxml.py", line 206, in data 
    def data(self, content): 
KeyboardInterrupt 

我通過the offical docs of python看了一下,但沒有得到任何線索。我嘗試在coro()中捕獲鍵盤中斷。

try: 
    soup = BeautifulSoup(source_code, 'lxml') 
except KeyboardInterrupt: 
    print ('capture exception') 
    raise 

每當BeautifulSoup()捕獲KeyboardInterrupt時,'try/except'都會發生錯誤。看起來BeautifulSoup會導致錯誤。但如何解決它?

+1

這有什麼好做BeautifulSoup。當您不檢索任務中引發的異常時,會發生此警告。您需要在某處添加對'worker.exception()的調用。 – dirn

回答

2

當你撥打task.cancel()這個功能實際上並不取消任務,它只是「標記」任務被取消。當任務恢復執行時,將開始取消任務的實際過程。 asyncio.CancelledError將在任務內立即產生,迫使它被實際取消。任務將通過此例外完成它的執行。

另一方面,如果您的某些任務靜靜地結束了異常(如果您沒有檢查任務執行的結果),asyncio會發出警告。

爲了避免出現問題,你應該等待任務取消接收asyncio.CancelledError(也許抑制,因爲你不需要它,然後):

import asyncio 
from contextlib import suppress 


async def coro(): 
    # ... 

def main(): 
    loop = asyncio.get_event_loop() 
    worker = asyncio.ensure_future(coro()) 
    try: 
     loop.run_until_complete(worker) 
    except KeyboardInterrupt: 
     print('keyboard interrupt') 

     worker.cancel() 
     with suppress(asyncio.CancelledError): 
      loop.run_until_complete(worker) # await task cancellation. 
    finally: 
     loop.close() 

if __name__ == '__main__': 
    main()