2016-11-11 41 views
2

我有一個腳本,異步下載多個URL,然後不斷地監視他們通過difflib的Python ASYNCIO與difflib慢如蝸牛

import asyncio 
import difflib 
import aiohttp 

urls = ['http://www.nytimes.com/', 
     'http://www.time.com/', 
     'http://www.economist.com/'] 

async def get_url(url): 
    async with aiohttp.ClientSession() as session: 
     async with session.get(url) as resp: 
      old = await resp.text() 
      print('Initial -',url) 
     while True: 
      async with session.get(url) as resp1: 
       new = await resp.text() 
      print('Got -',url) 
      diff = difflib.unified_diff(old, new) 

      for line in diff: 
       print(line) 
      old = new 

if __name__ == '__main__': 
    loop = asyncio.get_event_loop() 
    ops = [] 
    for url in urls: 
     ops.append(get_url(url)) 
    loop.run_until_complete(asyncio.wait(ops)) 

變化。當我用下面的線運行評論

 for line in diff: 
      print(line) 

該腳本按預期運行,每秒約3次檢索每個URL。

當行被取消註釋時,腳本變慢,比檢索連續運行慢得多。

我不知道爲什麼會發生這種情況,它是否與difflib返回一個生成器有關?

回答

0

首先,您的代碼有錯誤,而不是new = await resp.text()應該是new = await resp1.text()

unified_diff使用字符串列表而不是直接使用字符串。您可以使用splitlines()快速分割字符串成線:(!目前在長字符串的每個字符被視爲行)

diff = difflib.unified_diff(old.splitlines(), new.splitlines())