2016-12-23 61 views
0

我有一個腳本循環遍歷一系列URL來根據返回的json數據提取項目位置。然而,腳本需要60分鐘才能運行,55分鐘(每cprofile)等待json數據加載。使用python循環的多線程/多處理

我想多線程一次運行多個POST請求,以加快速度,並最初將URL範圍分成兩部分來執行此操作。我陷入困境的是如何實現多線程或asyncio。

瘦身代碼:

import asyncio 
import aiohttp 

# i am not recommend to use globals 
results = dict() 
url = "https://www.website.com/store/ajax/search" 
query = "store={}&size=18&query=17360031" 

# this is default url opener got from aiohttp documentation 
async def open_url(store, loop=None): 
    async with aiohttp.ClientSession(loop=loop) as session: 
     async with session.post(url, data={'searchQuery': query.format(store)}) as resp: 
      return await resp.json(), store 

async def processing(loop=None): 
    # U need to use 'global' keyworld if U wan't to write to global variables 
    global results 
    # one of the simplest ways to parallelize requests, is to init Future, and when data will be ready save it to global 
    tasks = [open_url(store, loop=event_loop) for store in range(0, 5)] 
    for coro in asyncio.as_completed(tasks, loop=loop): 
     try: 
      data, store = await coro 
      results[store] = data['searchResults']['results'][0]['location']['aisle'] 
     except (IndexError, KeyError): 
      continue 


if __name__ == '__main__': 
    event_loop = asyncio.new_event_loop() 
    event_loop.run_until_complete(processing(loop=event_loop)) 

# Print Results 
for store, data in results.items(): 
    print(store, data) 

JSON:

{u'count': 1, 
    u'results': [{u'department': {u'name': u'Home', u'storeDeptId': -1}, 
      u'location': {u'aisle': [A], u'detailed': [A.536]}, 
      u'score': u'0.507073'}], 
    u'totalCount': 1} 

回答

0

即使你使用多線程或者多,每個線程/進程仍然會阻塞,直到JSON數據被檢索。這可能會加速一些事情,但它仍然不是您的最佳選擇。

由於您正在使用請求,請嘗試使用grequests,它將此項與gevent相結合。這使您可以定義一系列異步運行的HTTP請求。結果你會獲得巨大的速度提升。用法非常簡單:只需創建一個請求列表(使用grequests.get)並將其傳遞給grequests.map

希望這會有所幫助!

+0

謝謝 - 讓這個異步運行將是理想的。我查看了爲grequests提供的示例,但他們專門定義了URL列表。我迷失在如何應用上面的代碼?另外,我會使用grequests.post而不是grerequests.get? –

0

如果你不想並行請求(我希望你可以問這個)。這段代碼會有所幫助。 有請求opener和通過aiohttp和asyncio發送的2000個post請求。 python3.5 used

import asyncio 
import aiohttp 

# i am not recommend to use globals 
results = dict() 
MAX_RETRIES = 5 
MATCH_SLEEP_TIME = 3 # i am recommend U to move this variables to other file like constants.py or any else 
url = "https://www.website.com/store/ajax/search" 
query = "store={}&size=18&query=44159" 

# this is default url opener got from aiohttp documentation 
async def open_url(store, semaphore, loop=None): 
    for _ in range(MAX_RETRIES): 
     with await semarhore: 
      try: 
       async with aiohttp.ClientSession(loop=loop) as session: 
        async with session.post(url, data={'searchQuery': query.format(store)}) as resp: 
         return await resp.json(), store 
      except ConnectionResetError: 
       # u can handle more exceptions here, and sleep if they are raised 
       await asyncio.sleep(MATCH_SLEEP_TIME, loop=loop) 
       continue 
    return None 

async def processing(semaphore, loop=None): 
    # U need to use 'global' keyworld if U wan't to write to global  variables 
    global results 
    # one of the simplest ways to parallelize requests, is to init  Future, and when data will be ready save it to global 
    tasks = [open_url(store, semaphore, loop=event_loop) for store in range(0,  2000)] 
    for coro in asyncio.as_completed(tasks, loop=loop): 
     try: 
      response = await coro 
      if response is None: 
       continue 
      data, store = response 
      results[store] = data['searchResults']['results'][0]['location']['aisle'] 
     except (IndexError, KeyError): 
      continue 


if __name__ == '__main__': 
    event_loop = asyncio.new_event_loop() 
    semaphore = asyncio.Semaphore(50, loop=event_loop) # count of concurrent requests 
    event_loop.run_until_complete(processing(semaphore, loop=event_loop)) 
+0

當我嘗試運行時,出現「NameError:name'store'未定義」。我對原文發表了一些更新,並且不確定「query =」store = {}&size = 18&query = 44159「 」和「data = json.loads(r.json()'searchResults']) ['results'] [0]「在您的修訂中處理。謝謝! –

+0

@MiaElla嘿,更新我的答案,但我從響應得到的HTML,而不是JSON,你確定,該網址是正確的?我建議您除了** json.decoder.JSONDecodeError **以避免json解碼錯誤。另外,我建議你使用setdefault或defaultdict –

+0

你是對的,網站名稱已被更正爲此測試。我已更正,但仍收到大量錯誤:文件「C:/ Users ... test.py」,第34行,在 event_loop.run_until_complete(處理(循環= event_loop)) 第387行,在run_until_complete中 返回future.result() 線274,在結果 加註self._exception 線239,在_step 結果= coro.send(無) 線27,在處理 results.setdefault(商店,[])。附加(data ['searchResults'] ['results'] [0] ['location'] ['aisle']) –