2017-09-30 45 views
1

我試圖加速一個進程,通過至少在兩個不同的內核上分配它來減慢我的主線程。與異步同步排隊時,代碼運行速度更快。它不應該是相反的嗎?

我認爲可以解決這個問題的原因是每個單獨的操作都是獨立的,只需要兩個點和一個浮點數。

但是,我的第一次刺戳是在執行queue.asnc vs queue.sync時運行速度明顯變慢,我不知道爲什麼!

這裏是同步運行

var block = UnsafeMutablePointer<Datas>.allocate(capacity: 0) 
var outblock = UnsafeMutablePointer<Decimal>.allocate(capacity: 0) 
func initialise() 
{ 
    outblock = UnsafeMutablePointer<Decimal>.allocate(capacity: testWith * 4 * 2) 

    block = UnsafeMutablePointer<Datas>.allocate(capacity: particles.count) 
} 

func update() 
{ 
    var i = 0 
    for part in particles 
    { 
     part.update() 

     let x1 = part.data.p1.x; let y1 = part.data.p1.y 
     let x2 = part.data.p2.x; let y2 = part.data.p2.x; 

     let w = part.data.size * rectScale 
     let w2 = part.data.size * rectScale 

     let dy = y2 - y1; let dx = x2 - x1 
     let length = sqrt(dy * dy + dx * dx) 
     let calcx = (-(y2 - y1)/length) 
     let calcy = ((x2 - x1)/length) 
     let calcx1 = calcx * w 
     let calcy1 = calcy * w 
     let calcx2 = calcx * w2 
     let calcy2 = calcy * w2 
     outblock[i] = x1 + calcx1 
     outblock[i+1] = y1 + calcy1 

     outblock[i+2] = x1 - calcx1 
     outblock[i+3] = y1 - calcy1 

     outblock[i+4] = x2 + calcx2 
     outblock[i+5] = y2 + calcy2 

     outblock[i+6] = x2 - calcx2 
     outblock[i+7] = y2 - calcy2 

     i += 8 
    } 
} 

下面的代碼是我在分配多個內核

let queue = DispatchQueue(label: "construction_worker_1", attributes: .concurrent) 

let blocky = block 
let oblocky = outblock 
for i in 0..<particles.count 
{ 
    particles[i].update() 
    block[i] = particles[i].data//Copy the raw data into a thead safe format 
    queue.async { 
     let x1 = blocky[i].p1.x; let y1 = blocky[i].p1.y 
     let x2 = blocky[i].p2.x; let y2 = blocky[i].p2.x; 

     let w = blocky[i].size * rectScale 
     let w2 = blocky[i].size * rectScale 

     let dy = y2 - y1; let dx = x2 - x1 
     let length = sqrt(dy * dy + dx * dx) 
     let calcx = (-(y2 - y1)/length) 
     let calcy = ((x2 - x1)/length) 
     let calcx1 = calcx * w 
     let calcy1 = calcy * w 
     let calcx2 = calcx * w2 
     let calcy2 = calcy * w2 

     let writeIndex = i * 8 
     oblocky[writeIndex] = x1 + calcx1 
     oblocky[writeIndex+1] = y1 + calcy1 

     oblocky[writeIndex+2] = x1 - calcx1 
     oblocky[writeIndex+3] = y1 - calcy1 

     oblocky[writeIndex+4] = x2 + calcx2 
     oblocky[writeIndex+5] = y2 + calcy2 

     oblocky[writeIndex+6] = x2 - calcx2 
     oblocky[writeIndex+7] = y2 - calcy2 
    } 
} 

我真的不知道爲什麼這種放緩正在發生中的工作量嘗試!我使用的是UnsafeMutablePointer,所以數據是線程安全的,我確保沒有變量可以同時被多個線程讀取或寫入。

這是怎麼回事?

+0

小問題,但我沒有遵循'w'和'w2'的意圖。 'sqrt(dy * dy + dx * dx)'也可以替換爲'hypot(dy,dx)'。另外,我不確定你對'UnsafeMutablePointer'線程優於Swiftier的爭論,比如'Array'。數組在調試版本中遭受性能損失(但提供安全性和改進的內存管理,顯然,不安全的指針不能)。而在優化的發佈版本中,性能很好。 – Rob

回答

2

Performing Loop Iterations Concurrently中所述,每個塊被分派到某個後臺隊列中有開銷。所以你會想要「跨越」你的數組,讓每個迭代處理多個數據點,而不僅僅是一個。

另外,Swift 3及更高版本中被稱爲concurrentPerform的被設計用於並行執行循環,並針對特定設備的內核進行優化。與跨步結合起來,你應該達到一些性能優勢:

DispatchQueue.global(qos: .userInitiated).async { 
    let stride = 100 
    DispatchQueue.concurrentPerform(iterations: particles.count/stride) { iteration in 
     let start = iteration * stride 
     let end = min(start + stride, particles.count) 
     for i in start ..< end { 
      particles[i].update() 
      block[i] = particles[i].data//Copy the raw data into a thead safe format 
      queue.async { 
       let x1 = blocky[i].p1.x; let y1 = blocky[i].p1.y 
       let x2 = blocky[i].p2.x; let y2 = blocky[i].p2.x 

       let w = blocky[i].size * rectScale 
       let w2 = blocky[i].size * rectScale 

       let dy = y2 - y1; let dx = x2 - x1 
       let length = hypot(dy, dx) 
       let calcx = -dy/length 
       let calcy = dx/length 
       let calcx1 = calcx * w 
       let calcy1 = calcy * w 
       let calcx2 = calcx * w2 
       let calcy2 = calcy * w2 

       let writeIndex = i * 8 
       oblocky[writeIndex] = x1 + calcx1 
       oblocky[writeIndex+1] = y1 + calcy1 

       oblocky[writeIndex+2] = x1 - calcx1 
       oblocky[writeIndex+3] = y1 - calcy1 

       oblocky[writeIndex+4] = x2 + calcx2 
       oblocky[writeIndex+5] = y2 + calcy2 

       oblocky[writeIndex+6] = x2 - calcx2 
       oblocky[writeIndex+7] = y2 - calcy2 
      } 
     } 
    } 
} 

你應該嘗試不同的stride值,看看性能的變化。

我不能運行此代碼(我沒有樣本數據,我沒有Datas等的定義),所以我很抱歉如果我介紹了任何問題。但請不要關注代碼,而應關注使用concurrentPerform執行併發循環的更廣泛的問題,並且要確保在每個線程上都有足夠的工作,因此線程化開銷不會超過更廣泛的好處並行運行線程。

有關更多信息,請參閱https://stackoverflow.com/a/22850936/1271826以獲得對此處問題的更廣泛討論。

2

您的期望可能是錯誤的。你的目標是釋放主線程,並且你做到了。 現在是什麼現在更快:主線程!

但是在後臺線程上的async表示「請任何時候請這樣做,讓它暫停,以便其他代碼可以在其中運行」 - 這並不意味着「快速執行」,而不是所有。而且我的代碼中沒有看到任何qos規範,所以它不像要求特別關注或任何事情。