for循環中的多處理

假設我有一個字典，其中每個元素是一個由GPS座標元組定義的四邊形，並且還有一個包含一系列行程的起點和終點的GPS座標的元組：（（（origin_latitude，origin_longitude），（dest_latitude，dest_longitude）），（（...），（...）））。下面是兩個四邊形和兩個行程的示例：for循環中的多處理

dictionary={0:((0,0),(0,1),(1,1),(1,0)),1:((3,3),(3,4),(4,4),(4,3))} 
trips=(((0.5,0.5),(3.5,3.5)),((-1,-1),(-2,-2)))

欲每一行到原點四邊形號，目的四邊形號碼，始發地和目的地（跳閘參考號）之間的組合數目。這裏分類是我在做什麼：

import matplotlib.path as mplPath 

def is_in_zone(quadri,point): 

    bbPath = mplPath.Path(quadri) 
    return bbPath.contains_point(point) 

def get_zone_nbr(dictio,trip): 

    start_zone=-1 
    end_zone=-1 
    trip_ref=-1 

    for key,coordinates in dictio.iteritems(): 

     if is_in_zone(coordinates,trip[0]): 
      start_zone=key 
     if is_in_zone(coordinates,trip[1]): 
      end_zone=key 
     if start_zone>-1 and end_zone>-1: 
      trip_ref=len(dictio)*start_zone+end_zone 
      break 
    return (start_zone,end_zone,trip_ref) 

if __name__=='__main__': 

    dictionary={0:((0,0),(0,1),(1,1),(1,0)),1:((3,3),(3,4),(4,4),(4,3))} 
    trips=(((0.5,0.5),(3.5,3.5)),((-1,-1),(-2,-2))) 

    for t in trips: 
     get_zone_nbr(dictionary,t)

我的字典將大約是30，所以函數get_zone_nbr會很慢。我有數百萬次的旅程。你看到任何明顯的方法來優化get_zone_nbr（）？或者任何能夠使代碼運行得更快的東西（例如多處理，但我不確定如何在循環中使用它）。

來源

2015-08-14 gjy

簡單的第一個並行就是並行處理你的行程。

>>> import matplotlib.path as mplPath 
>>> def is_in_zone(quadri,point): 
...  bbPath = mplPath.Path(quadri) 
...  return bbPath.contains_point(point) 
... 
>>> def get_zone_nbr(dictio,trip): 
...  start_zone=-1 
...  end_zone=-1 
...  trip_ref=-1 
...  for key,coordinates in dictio.iteritems(): 
...   if is_in_zone(coordinates,trip[0]): 
...    start_zone=key 
...   if is_in_zone(coordinates,trip[1]): 
...    end_zone=key 
...   if start_zone>-1 and end_zone>-1: 
...    trip_ref=len(dictio)*start_zone+end_zone 
...    break 
...  return (start_zone,end_zone,trip_ref) 
... 
>>> dictionary={0:((0,0),(0,1),(1,1),(1,0)),1:((3,3),(3,4),(4,4),(4,3))} 
>>> trips=(((0.5,0.5),(3.5,3.5)),((-1,-1),(-2,-2))) 
>>> 
>>> from pathos.pools import ThreadPool 
>>> pool = ThreadPool() 
>>> 
>>> results = pool.map(lambda x: get_zone_nbr(dictionary, x), trips) 
>>> results 
[(0, 1, 1), (-1, -1, -1)]

我使用pathos其是multiprocessing叉，可提供更好的序列化，靈活性和交互性。（我也是作者）

你也可以應用相同的方法將函數get_zone_nbr中的for循環轉換爲map函數調用。 pathos允許您使用帶有多個參數的map調用。由於您正在處理字典項目，並且項目自然會是無序的，因此您可以使用「無序迭代映射」（pathos即uimap，但在multiprocessing它是imap_unordered）。

我還建議你定時查看你的代碼，看看哪個map調用會更快。有幾種不同的map調用，以及幾個不同的並行後端。我在上面使用了一個線程池，但跨進程和套接字也是並行的（後者對於你的情況來說太慢了）。 pathos爲所有選項提供統一的API，因此您只需編寫一次，然後放入任何其他池/地圖，直到找到最快的案例。

獲取pathos這裏：https://github.com/uqfoundation

來源

2015-08-15 15:10:59

感謝您的幫助，邁克！我會試試看。 – gjy

for循環中的多處理

回答

相關問題