使用Compute Shaders編寫渲染紋理的速度很慢？

我只是在學習Compute Shaders並嘗試使用Unity（我對shaders太不熟悉），並且我正在嘗試做一些簡單的raycast並從Compute Shader寫入渲染紋理。一切工作都很完美，我得到了想要的結果。光線三角形的交叉發生得非常快 - 僅僅不到半秒。但是，現在我嘗試將新顏色應用於渲染紋理，性能下降。所需時間跳至5秒。我無法擺脫一個循環而不會使性能進一步惡化。我甚至無法在循環中使用布爾標誌，如果它在循環中設置爲true，我可以在循環外使用更新紋理顏色。使用Compute Shaders編寫渲染紋理的速度很慢？

表現變得非常糟糕。我將如何更新渲染紋理顏色？

這裏是着色器代碼：任何幫助表示讚賞。

//-------------------------------------------------------------------- 
#pragma kernel MainCS 

//-------------------------------------------------------------------- 
struct Triangle 
{ 
    float3 v0; 
    float3 v1; 
    float3 v2; 
    float3 n; 
}; 

// Precomputed and set from C# script 
struct Pixel 
{ 
    float3 position; 
    float3 direction; 
    int  index; 
    float pixelColor; 
}; 

//----------------------------------------------------------------------------- 
#define blocksize 8 

// variables 
int imageSize; 

// buffers 
RWStructuredBuffer<Pixel>  pixels : register(u0); // UAV 
RWTexture2D<float4>    rendTex : register(u1); // UAV 
const StructuredBuffer<Triangle> tris  : register(t0); // SRV 


// This kernel writes some color in the current pixel if there is ray intersection with some of the triangles from the tris buffer. In general works well but slow. The intersection part without writing to the render texture is SUPER FAST. When i attempt to write to the texture - gets SUPER SLOW. Render Texture random write is enabled from the C# script 

[numthreads(blocksize,blocksize,1)] 
void MainCS (uint3 id : SV_DispatchThreadID, uint3 Gid : SV_GroupID, uint3 GTid : SV_GroupThreadID, uint GI : SV_GroupIndex) 
{ 
    // Get the current pixel ID - pixels is 1D array 
    int pixelID = (int)(id.y * imageSize + id.x); 

    // Ray 
    float3 rayO = pixels[pixelID].position; 
    float3 rayD = pixels[pixelID].direction; 

    // Intersection variables 
    float3 pt0, pt1, pt2, edge0, edge1, edge2, cross1, cross2, cross3, n; 
    float angle1, angle2, angle3; 
    float r, _a, b; 
    float3 w0, I; 

    bool bIntersect = false; 

    [loop][allow_uav_condition] 
    for (uint tr = 0; tr < tris.Length; tr++) 
    { 
     // Somecalculations 
     pt0 = tris[tr].v0; pt1 = tris[tr].v1; pt2 = tris[tr].v2; 
     edge0 = rayO - pt0; edge1 = rayO - pt1; edge2 = rayO - pt2; 

     // First check - is the ray intersecting the triangle 
     if (dot(rayD, cross(edge0, edge1)) >= 0.0 || 
      dot(rayD, cross(edge1, edge2)) >= 0.0 || 
      dot(rayD, cross(edge2, edge0)) >= 0.0) continue; 

     // Fiding the intersection point 
     n = normalize(cross(pt0 - pt1, pt0 - pt2)); 
     w0 = rayO - pt0; 
     _a = -dot(n, w0); 
     b = dot(n, rayD); 
     r = _a/b; 
     I = rayO + rayD * r; 

     // Second check - before validate the hitpoint 
     if (_a < 0.0) 
     { 
      // Here i would want to update texture colors 

      // ============================================== 
      // Variant 1 ======================================= 
      // Only update the texture without break; 
      // Gives proper result but is SLOW - 3 seconds 
      rendTex[id.xy] = float4(1.0, 0.0, 0.0, 1.0); 
      // if add break; - MUCH SLOWER 
      break; 

      // =============================================== 
      // Variant 2 - Part 1 ================================== 
      // rising flag to true - fast 
      if(!bIntersect) 
      { 
       bIntersect = true; 
      } 
     } 
    } 

// Variant 2 - Part 2 - When using the flag - updating Render texture colror is SUPER SLOW but acurate 
    if(bIntersect) 
     rendTex[id.xy] = float4(1.0, 0.0, 0.0, 1.0); 
}

來源

2015-12-24 Venci Dimitrov

在GPU上編程時，動態分支是非常昂貴的。

這是因爲GPU設計的方式。 CPU工作原理的簡化視圖：獲取指令，對其進行解碼，然後在ALU上執行。 GPU獲取指令，對其進行解碼，然後同時在一堆ALU上執行它。它同時遍歷每個線程的每一行，並且需要爲所有這些像素重新運行程序，即使這些線程中只有一個必須執行不同的指令。

基本上，應儘可能避免動態分支（if語句）。當你使用有條件的中斷執行for循環時，你會創建很多分支，這是GPU的Achille的腳跟。該標誌更快，因爲GPU無論如何都能夠執行每個線程上的所有指令。嘗試讓儘可能多的線程儘可能地執行相同的代碼行。

來源

2017-06-06 15:32:43 Alex

我假設你正在試圖製作一種類似繪圖工具的東西，可以讓你在曲面上繪圖。我已經構建過其中之一，但是這是通過直接從Unity繪製紋理來完成的，而不是從着色器中完成的。此外，除非您嘗試捕獲另一臺相機的渲染，然後在其上進行合成，否則不需要此渲染紋理。

着色器通常非常快，因爲它們可以同時並行繪製多個像素到繪製緩衝區。而寫入紋理內存要慢得多。您的性能問題很可能是由於着色器不斷更新每個幀的每個像素的紋理。很多非常小的寫入操作。想象一下，通過打開文本文件，更新單個字符，然後重複關閉它來寫小說。

我建議使用Texture2D.setPixels()直接繪製到Unity中的紋理。它允許您通過接受Unity Color對象的數組來批量寫入紋理內存，並只在您的紋理上調用texture.Apply()時才發送這些修改過的像素。

此外，如果您需要獲取紋理空間中的UV座標，則有RaycastHit.textureCoord。

下面是Unity文檔中提供的示例，用於根據光線投射到物體表面的位置繪製紋理。

using UnityEngine; 
using System.Collections; 

public class ExampleClass : MonoBehaviour { 
    public Camera cam; 
    void Start() { 
     cam = GetComponent<Camera>(); 
    } 
    void Update() { 
     if (!Input.GetMouseButton(0)) 
      return; 

     RaycastHit hit; 
     if (!Physics.Raycast(cam.ScreenPointToRay(Input.mousePosition), out hit)) 
      return; 

     Renderer rend = hit.transform.GetComponent<Renderer>(); 
     MeshCollider meshCollider = hit.collider as MeshCollider; 
     if (rend == null || rend.sharedMaterial == null || rend.sharedMaterial.mainTexture == null || meshCollider == null) 
      return; 

     Texture2D tex = rend.material.mainTexture as Texture2D; 
     Vector2 pixelUV = hit.textureCoord; 
     pixelUV.x *= tex.width; 
     pixelUV.y *= tex.height; 
     tex.SetPixel((int)pixelUV.x, (int)pixelUV.y, Color.black); 
     tex.Apply(); 
    } 
}

來源

2015-12-24 09:20:02 Soviut

感謝您的建議。但我實際上正在試圖製作一個GPU RayTracer - 至少在它的基礎層面上 - 只拍攝主光線和檢測三角形。所有需要的數據都是在C＃腳本中預先計算的，然後使用緩衝區和RWTexture2D上傳到Compute Shader。我做了一些研究，並認爲使用渲染紋理最適合於此目的，但它允許隨機讀寫訪問 - 在將渲染紋理上傳到GPU內存之前，在C＃腳本中啓用此選項。這不是用ComputeShader將信息寫入紋理的正確方法嗎？ –

使用Compute Shaders編寫渲染紋理的速度很慢？

回答

相關問題