金屬功能多重呼叫的性能

我使用Apple Metal爲iPhone/iPad製作剛體仿真。要做到這一點，我需要做很多內核函數調用，而且我看到，它需要很長時間，與CUDA相反。我實現了金屬的內核函數調用，就像蘋果教程介紹金屬功能多重呼叫的性能

let commandQueue = device.newCommandQueue() 

var commandBuffers:[MTLCommandBuffer]=[] 
var gpuPrograms:[MTLFunction]=[] 
var computePipelineFilters:[MTLComputePipelineState]=[] 
var computeCommandEncoders:[MTLComputeCommandEncoder]=[] 

//here i fill all arrays for my command queue 
//and next i execute it 

let threadsPerGroup = MTLSize(width:1,height:1,depth:1) 
let numThreadgroups = MTLSize(width:threadsAmount, height:1, depth:1) 

for computeCommandEncoder in computeCommandEncoders 
{ 
    computeCommandEncoder.dispatchThreadgroups(numThreadgroups, threadsPerThreadgroup: threadsPerGroup) 
} 

for computeCommandEncoder in computeCommandEncoders 
{ 
    computeCommandEncoder.endEncoding() 
} 

for commandBuffer in commandBuffers 
{ 
    commandBuffer.enqueue() 
} 

for commandBuffer in commandBuffers 
{ 
    commandBuffer.commit() 
} 

for commandBuffer in commandBuffers 
{ 
    commandBuffer.waitUntilCompleted() 
}

我做的最多幾十金屬內核函數每一幀，和它的作品過於緩慢。我用空的內核函數對它進行了測試 - 它告訴我，問題出現在Swift部分執行中。我的意思是，當我想在CUDA中執行內核函數時，我就像通常的函數那樣調用它，它的工作速度非常快。但是在這裏，我必須爲每一個函數的每一個執行每幀執行許多動作。可能是我不知道的東西，但我想創建所有其他對象一次，然後就作出這樣

commandQueue.execute()

東西來執行所有內核函數。

我是否有權執行許多內核函數，或者有其他方法可以更快地執行它？

來源

2015-10-18 Ivan

我有幾個項目在一個步驟中使用多個着色器。我只創建一個緩衝區和編碼器，但是多個管道狀態;每個計算功能一個。

記住MTLCommandQueue是持久的，所以只需要創建一次，所以我MetalKit查看的drawRect()功能大致是這樣的（有被它們之間傳遞更多着色器和紋理，但你得到結構的想法）：

let commandBuffer = commandQueue.commandBuffer() 
let commandEncoder = commandBuffer.computeCommandEncoder() 

commandEncoder.setComputePipelineState(advect_pipelineState) 
commandEncoder.dispatchThreadgroups(threadgroupsPerGrid, 
    threadsPerThreadgroup: threadsPerThreadgroup) 

commandEncoder.setComputePipelineState(divergence_pipelineState) 
commandEncoder.dispatchThreadgroups(threadgroupsPerGrid, 
    threadsPerThreadgroup: threadsPerThreadgroup) 

[...] 

commandEncoder.endEncoding() 
commandBuffer.commit()

我的代碼在着色器二十次的一個實際迭代，並且仍然運行得nippily，所以如果你重新組織，並按照這個結構，單個緩衝區和一個編碼器和來電endEncoding()和commit()每通一次，你可能會看到性能的提高。

5月作爲操作詞:)

來源

2015-10-18 13:57:20

金屬功能多重呼叫的性能

回答

相關問題