使用python循環的多線程/多處理

我有一個腳本循環遍歷一系列URL來根據返回的json數據提取項目位置。然而，腳本需要60分鐘才能運行，55分鐘（每cprofile）等待json數據加載。使用python循環的多線程/多處理

我想多線程一次運行多個POST請求，以加快速度，並最初將URL範圍分成兩部分來執行此操作。我陷入困境的是如何實現多線程或asyncio。

瘦身代碼：

import asyncio 
import aiohttp 

# i am not recommend to use globals 
results = dict() 
url = "https://www.website.com/store/ajax/search" 
query = "store={}&size=18&query=17360031" 

# this is default url opener got from aiohttp documentation 
async def open_url(store, loop=None): 
    async with aiohttp.ClientSession(loop=loop) as session: 
     async with session.post(url, data={'searchQuery': query.format(store)}) as resp: 
      return await resp.json(), store 

async def processing(loop=None): 
    # U need to use 'global' keyworld if U wan't to write to global variables 
    global results 
    # one of the simplest ways to parallelize requests, is to init Future, and when data will be ready save it to global 
    tasks = [open_url(store, loop=event_loop) for store in range(0, 5)] 
    for coro in asyncio.as_completed(tasks, loop=loop): 
     try: 
      data, store = await coro 
      results[store] = data['searchResults']['results'][0]['location']['aisle'] 
     except (IndexError, KeyError): 
      continue 


if __name__ == '__main__': 
    event_loop = asyncio.new_event_loop() 
    event_loop.run_until_complete(processing(loop=event_loop)) 

# Print Results 
for store, data in results.items(): 
    print(store, data)

JSON：

{u'count': 1, 
    u'results': [{u'department': {u'name': u'Home', u'storeDeptId': -1}, 
      u'location': {u'aisle': [A], u'detailed': [A.536]}, 
      u'score': u'0.507073'}], 
    u'totalCount': 1}

來源

2016-12-23 Mia Ella

即使你使用多線程或者多，每個線程/進程仍然會阻塞，直到JSON數據被檢索。這可能會加速一些事情，但它仍然不是您的最佳選擇。

由於您正在使用請求，請嘗試使用grequests，它將此項與gevent相結合。這使您可以定義一系列異步運行的HTTP請求。結果你會獲得巨大的速度提升。用法非常簡單：只需創建一個請求列表（使用grequests.get）並將其傳遞給grequests.map。

希望這會有所幫助！

來源

2016-12-23 22:29:07 cdonts

謝謝 - 讓這個異步運行將是理想的。我查看了爲grequests提供的示例，但他們專門定義了URL列表。我迷失在如何應用上面的代碼？另外，我會使用grequests.post而不是grerequests.get？ –

如果你不想並行請求（我希望你可以問這個）。這段代碼會有所幫助。有請求opener和通過aiohttp和asyncio發送的2000個post請求。 python3.5 used

import asyncio 
import aiohttp 

# i am not recommend to use globals 
results = dict() 
MAX_RETRIES = 5 
MATCH_SLEEP_TIME = 3 # i am recommend U to move this variables to other file like constants.py or any else 
url = "https://www.website.com/store/ajax/search" 
query = "store={}&size=18&query=44159" 

# this is default url opener got from aiohttp documentation 
async def open_url(store, semaphore, loop=None): 
    for _ in range(MAX_RETRIES): 
     with await semarhore: 
      try: 
       async with aiohttp.ClientSession(loop=loop) as session: 
        async with session.post(url, data={'searchQuery': query.format(store)}) as resp: 
         return await resp.json(), store 
      except ConnectionResetError: 
       # u can handle more exceptions here, and sleep if they are raised 
       await asyncio.sleep(MATCH_SLEEP_TIME, loop=loop) 
       continue 
    return None 

async def processing(semaphore, loop=None): 
    # U need to use 'global' keyworld if U wan't to write to global  variables 
    global results 
    # one of the simplest ways to parallelize requests, is to init  Future, and when data will be ready save it to global 
    tasks = [open_url(store, semaphore, loop=event_loop) for store in range(0,  2000)] 
    for coro in asyncio.as_completed(tasks, loop=loop): 
     try: 
      response = await coro 
      if response is None: 
       continue 
      data, store = response 
      results[store] = data['searchResults']['results'][0]['location']['aisle'] 
     except (IndexError, KeyError): 
      continue 


if __name__ == '__main__': 
    event_loop = asyncio.new_event_loop() 
    semaphore = asyncio.Semaphore(50, loop=event_loop) # count of concurrent requests 
    event_loop.run_until_complete(processing(semaphore, loop=event_loop))

來源

2016-12-23 22:50:37

當我嘗試運行時，出現「NameError：name'store'未定義」。我對原文發表了一些更新，並且不確定「query =」store = {}＆size = 18＆query = 44159「」和「data = json.loads（r.json（）'searchResults']） ['results'] [0]「在您的修訂中處理。謝謝！ –

@MiaElla嘿，更新我的答案，但我從響應得到的HTML，而不是JSON，你確定，該網址是正確的？我建議您除了** json.decoder.JSONDecodeError **以避免json解碼錯誤。另外，我建議你使用setdefault或defaultdict –

你是對的，網站名稱已被更正爲此測試。我已更正，但仍收到大量錯誤：文件「C：/ Users ... test.py」，第34行，在 event_loop.run_until_complete（處理（循環= event_loop））第387行，在run_until_complete中返回future.result（）線274，在結果加註self._exception 線239，在_step 結果= coro.send（無）線27，在處理 results.setdefault（商店，[]）。附加（data ['searchResults'] ['results'] [0] ['location'] ['aisle']） –

使用python循環的多線程/多處理

回答

相關問題