2014-03-14 38 views
1

我想通過一個類到iPython並行執行。實際上,這段代碼會運行,但每次都會加載「時區」。這個類每個負載需要大約10s,所以這個開銷是不可接受的,除非它只發生一次,或者每個核心發生一次。 我對並行化非常陌生,現在我想知道將進口移出函數。至少我認爲這是正確的做法。傳遞類到iPython並行

from IPython import parallel 
clients = parallel.Client() 
lview = clients.load_balanced_view() 

lview.block = True 

lats = [32.21, 34.98] 
lons = [109.45, -102.4] 
times = ['2014-03-12T16:20:44.000000000Z', '2014-03-12T15:48:52.000000000Z'] 

@lview.parallel() 
def f(lats, lons, times): 
    import sys,os 
    sys.path.append("../utils/") # For grabbing 'Timezone' 

    import Timezone as Timezone 
    tz = Timezone.Timezone() 

    # Use tz to compute local time 
    a = tz.compute_local_time(lats, lons, times) 

    return a 

%time f.map(lats, lons, times) 

結果:在時間(約22秒),

in sync results <function __call__ at 0x105d2db18> 
CPU times: user 700 ms, sys: 232 ms, total: 932 ms 
Wall time: 11.6 s 
Out[15]: 
[('Asia/Chongqing', '2014-03-13 00:20:44'), 
('America/Chicago', '2014-03-12 10:48:52')] 

結果雙如果I雙輸入數據的長度。 我怎樣才能通過tz並讓每個核心都調用Timezone方法。

回答

1

我想通了。這是我做到的。
首先,我使用直接視圖並將模塊加載到每個內核上,然後使用scattergather分解輸入,最後使用map訪問數組/列表輸入。

from IPython import parallel 
from IPython import parallel as p 

rc = p.Client() 
rc[:].execute('import sys,os') 
rc[:].execute('sys.path.append("../utils/")') 
rc[:].execute('import Timezone as Timezone; tz = Timezone.Timezone()') 

dview = rc[:] # A DirectView of all engines 
dview.block = True 

在下一單元格:

def f(v, lats, lons, times): 
    v.scatter('lat', lats) 
    v.scatter('lon', lons) 
    v.scatter('time', times) 
    v.execute("D=map(tz.compute_local_time, lat, lon, time)") 
    return v.gather('D', block=True) 

lats = [32.21] 
lons = [109.45] 
times = ['2014-03-12T16:20:44.000000000Z'] 

%time r = f(dview, lats, lons, times) 

這給了我想要的輸出,正要快兩倍比只使用:

map(tz.compute_local_time, lat, lon, time)