我有一個腳本循環遍歷一系列URL來根據返回的json數據提取項目位置。然而,腳本需要60分鐘才能運行,55分鐘(每cprofile)等待json數據加載。使用python循環的多線程/多處理
我想多線程一次運行多個POST請求,以加快速度,並最初將URL範圍分成兩部分來執行此操作。我陷入困境的是如何實現多線程或asyncio。
瘦身代碼:
import asyncio
import aiohttp
# i am not recommend to use globals
results = dict()
url = "https://www.website.com/store/ajax/search"
query = "store={}&size=18&query=17360031"
# this is default url opener got from aiohttp documentation
async def open_url(store, loop=None):
async with aiohttp.ClientSession(loop=loop) as session:
async with session.post(url, data={'searchQuery': query.format(store)}) as resp:
return await resp.json(), store
async def processing(loop=None):
# U need to use 'global' keyworld if U wan't to write to global variables
global results
# one of the simplest ways to parallelize requests, is to init Future, and when data will be ready save it to global
tasks = [open_url(store, loop=event_loop) for store in range(0, 5)]
for coro in asyncio.as_completed(tasks, loop=loop):
try:
data, store = await coro
results[store] = data['searchResults']['results'][0]['location']['aisle']
except (IndexError, KeyError):
continue
if __name__ == '__main__':
event_loop = asyncio.new_event_loop()
event_loop.run_until_complete(processing(loop=event_loop))
# Print Results
for store, data in results.items():
print(store, data)
JSON:
{u'count': 1,
u'results': [{u'department': {u'name': u'Home', u'storeDeptId': -1},
u'location': {u'aisle': [A], u'detailed': [A.536]},
u'score': u'0.507073'}],
u'totalCount': 1}
謝謝 - 讓這個異步運行將是理想的。我查看了爲grequests提供的示例,但他們專門定義了URL列表。我迷失在如何應用上面的代碼?另外,我會使用grequests.post而不是grerequests.get? –