2017-07-07 70 views
3

我已經寫了一個解決拉普拉斯方程的序列碼,但是當我試圖在Julia中並行寫入時,它需要比串行碼更多的時間和內存。我寫了一個簡單的例子。我怎樣才能平行這段代碼?如何在茱莉亞語中並行簡單的循環?

有一個域名爲t1

t2將計算出來,然後t1 = t2

@everywhere function left!(t1,t2,n,l_type,b_left,dx=1.0,k=50.0) 
    if l_type==1 
      for i=1:n 
       t2[i,1]=(b_left*dx/k)+t1[i,2]; 
       t1[i,1]=t2[i,1]; 
      end 
    else 
     for i=1:n 
     t1[i,1]=b_left; 
     end 
    end 
    return t1 end 

# parallel left does not work. 
@everywhere function pleft!(t1,t2,n,l_type,b_left,dx=1.0,k=50.0) 
    if l_type==1 
      @parallel for i=1:n 
       t2[i,1]=(b_left*dx/k)+t1[i,2]; 
       t1[i,1]=t2[i,1]; 
      end 
    else 
    @parallel for i=1:n 
     t1[i,1]=b_left; 
     end 
    end 
    return t1 
end 
n = 10; 
t1 = SharedArray(Float64,(n,n)); 
t2=t1; 
typ = 0; 
value = 10; 
dx = 1; 
k=50; 

@time t3 = pleft!(t1,t2,n,typ,value,dx,k) 
@time t2 = left!(t1,t2,n,typ,value,dx,k) 

的答案是:

0.000872 seconds (665 allocations: 21.328 KB) # for parallel one 
0.000004 seconds (4 allocations: 160 bytes) #for usual one 

我怎樣才能解決這個問題?

經過計算,我應該在while循環中計算下面。 我需要並行下面的代碼。

@everywhere function oneStepseri(t1,N) 
    t2 = t1; 
    for j = 2:(N-1) 
     for i = 2:(N-1) 
     t2[i,j]=0.25*(t1[i-1,j]+t1[i+1,j]+t1[i,j-1]+t1[i,j+1]); 
     end 
       end 
    return t2; 
end 

感謝...

+0

您是否在嘗試「預熱」之前計時?例如,不要像'@time rand(1000)'那樣計時,首先應該運行'rand(1000)'三次或四次以便JIT編譯它,然後才應該'@time'。 – RedPointyJackson

+0

是的,我做到了。甚至@time本身。仍然太慢。 –

回答

0

我試過很多東西。 @parallelSharedArrayDistributed Array,域分隔和使用@spawn。沒有加速。 但最近Julia添加了「Threads」,您可以在命令窗口中通過導出JULIA_NUM_THREADS=4 添加主題。通過使用[email protected]您可以平行您的代碼。 檢查線程數Threads.nthreads() 這裏是我的代碼 ,它給了我一個很好的加速。

#to add threads export JULIA_NUM_THREADS=4 

nth = Threads.nthreads(); #print number of threads 

println(nth); 

a = zeros(10); 

[email protected] for i = 1:10 
      a[i] = Threads.threadid() 
     end 

show(a) 

b = zeros(100000); 
c = zeros(100000); 
b[1] = b[end] = 1; 
c[1] = c[end] = 1; 

function noth(A) 
    B = A; 
    for i=2:(length(A)-1) 
     B[i] = (A[i-1] + A[i+1])*0.5; 
    end 
    return B 
end 

function th(A) 
    B = A; 
    [email protected] for i=2:(length(A)-1) 
     B[i] = (A[i-1] + A[i+1])*0.5; 
    end 
    return B 
end 


println("warmup noth , th") 
@time bb = noth(b) 
@time cc = th(c) 
println("end ") 
@time bb = noth(b) 
@time cc = th(c) 

@time bb = noth(b) 
@time cc = th(c) 

@time bb = noth(b) 
@time cc = th(c) 
@time bb = noth(b) 
@time cc = th(c) 
@time bb = noth(b) 
@time cc = th(c) 
@time bb = noth(b) 
@time cc = th(c) 
show(bb[10]) 
println("\nbb ------------------------------------------------------------------------------------------------------------------> cc") 
show(cc[10]) 

的答案是這樣的

5                                          
[1.0,1.0,2.0,2.0,3.0,3.0,4.0,4.0,5.0,5.0]warmup noth , th                            
    0.008661 seconds (2.53 k allocations: 113.180 KB)                             
    0.020738 seconds (7.94 k allocations: 336.981 KB)                             
end                                         
    0.000446 seconds (4 allocations: 160 bytes)                               
    0.000122 seconds (6 allocations: 224 bytes)                               
    0.000437 seconds (4 allocations: 160 bytes)                               
    0.000135 seconds (6 allocations: 224 bytes)                               
    0.000435 seconds (4 allocations: 160 bytes)                               
    0.000115 seconds (6 allocations: 224 bytes)                               
    0.000447 seconds (4 allocations: 160 bytes)                               
    0.000112 seconds (6 allocations: 224 bytes)                               
    0.000440 seconds (4 allocations: 160 bytes)                               
    0.000109 seconds (6 allocations: 224 bytes)                               
    0.000439 seconds (4 allocations: 160 bytes)                               
    0.000116 seconds (6 allocations: 224 bytes)                               
0.052478790283203125                                     
bb ------------------------------------------------------------------------------------------------------------------> cc            
[email protected]:~/threads$                               

5個線程和100000個節點。

請注意,對於暖機沒有加速。但之後有加速。

0.000446 seconds (4 allocations: 160 bytes) # usual code run      
0.000122 seconds (6 allocations: 224 bytes) #parallel code run       
+0

如果您有更好的做法,請通知我。這將不勝感激。謝謝。 –