在大型數組中設置順序項子集的最高性能方法是什麼？

我正在使用相機流。我每幀帶來1,228,800字節，所以效率非常關鍵，每字節納秒快速增加。在大型數組中設置順序項子集的最高性能方法是什麼？

我已經想出了一些示例代碼，以儘可能簡潔地描述問題，而不會顯得過於人爲。

這個例子中的代碼有很多效率低下的問題，例如定義循環內部的變量，或者分割亮度值而不是使用複合值。這些不是我關心的問題，只是爲了讓示例更簡單。

我需要建議的是C＃中最高性能的方法，用於在非常大的數組中確定某個位置的3個連續值，比如在下面的情況下，我將BGR設置爲255，同時跳過第4個字節。

編輯：爲了澄清，有關的問題是我重新索引輸出爲每個正在設置的索引。如果我已經有了前一個項目的位置，似乎應該有一些方法可以不遍歷每個值的整個數組。

// Colors are stored as 4 bytes: BGRX where X is always 0 
    public byte[] Input = new byte[640 * 480 * 4]; 
    public byte[] Output = new byte[640 * 480 * 4]; 

    public int Threshold = 180; 

    public void ProcessImage() { 
     for (int i = 0; i < Input.Length; i += 4) { 
      var brightness = (Input[i] + Input[i + 1] + Input[i + 2])/3; // some value under 255 

      if (brightness > Threshold) { 
       // What is the most efficient way possible to do this? 
       Output[i] = 255 - Input[i]; 
       Output[i + 1] = 255 - Input[i + 1]; 
       Output[i + 2] = 255 - Input[i + 2]; 
      } 
      else { 
       Output[i] = Input[i]; 
       Output[i + 1] = Input[i + 1]; 
       Output[i + 2] = Input[i + 2]; 
      } 
     } 
    }

來源

2016-06-17 Jim Yarbro

搞清楚如何去除循環中的（難以預測的）分支（'if（brightness> Threshold）'）可能會產生最高的效率。 – spender

這不是你如何計算亮度。如果你不能讓相機吐出一個位圖，所以你可以使用像ColorMatrix這樣的內置.NET類，然後使用像Emgu CV這樣的圖像處理庫。 –

正如我在我的問題中所說的那樣，爲了使示例儘可能簡單，代碼中存在很多低效率。請回答這個問題，而不是批評我已經確定爲人爲的代碼。 –

這（未經測試以及不安全的）代碼應該是更快，如果你關心的是速度：

public void ProcessImage() 
{ 
    int ilength = Input.Length; 
    Debug.Assert(ilength == Output.Length); 
    Debug.Assert(ilength%4 == 0); 
    unsafe 
    { 
     GCHandle pinned1 = GCHandle.Alloc(Input, GCHandleType.Pinned); 
     byte* input = (byte*)pinned1.AddrOfPinnedObject(); 
     GCHandle pinned2 = GCHandle.Alloc(Input, GCHandleType.Pinned); 
     byte* output = (byte*)pinned2.AddrOfPinnedObject(); 
     for (int i = 0; i < ilength; i += 4) 
     { 
      var brightness = (*(input) + *(input + 1) + *(input + 2))/3; 
      if (brightness > Threshold) 
      { 
       // What is the most efficient way possible to do this? 
       (*(output)) = (byte)(255 - *(input)); 
       (*(output+1)) = (byte)(255 - *(input+1)); 
       (*(output+2)) = (byte)(255 - *(input+2)); 
      } 
      else 
      { 
       (*(output)) = *(input); 
       (*(output + 1)) = *(input + 1); 
       (*(output + 2)) = *(input + 2); 
      } 
      input += 4; 
      output += 4; 
     } 

     pinned1.Free(); 
     pinned2.Free(); 
    } 
}

注意我已經在頂部納入必要的假設功能。我建議你總是這樣做，但是不管你喜歡Debug.Assert還是其他形式的驗證都由你決定。

來源

2016-06-17 12:31:24

謝謝！這裏有一些新的概念供我學習，所以在我將其標記爲答案之前有一段時間。 –

這個蒼蠅。我在1142ms基準測試了幾千個原始代碼，這個代碼在444ms，並且與塊拷貝結合，因此可以在390ms處跳過else子句。 – spender

我結束了使用固定（）塊而不是GCHandles，但如果我理解正確，那只是語法糖。無論如何，你的回答直接導致了我當前減少每幀3ms的延遲，而且我甚至沒有在整個項目中實施。謝謝！ –

就在c＃中將單個值設置爲多個數組標記的最高性能方式而言，我認爲您正在考慮它。沒有非循環的方式將相同的值設置爲多個標記。見How can I assign a value to multiple array indices at once without looping?

如果有幫助，有沒有必要爲else語句，你設定的3個indicies 0 default(byte)已經是零，因此輸出繼電器[]數組中的各項指標將初始化爲0

由於一個側面說明，定義循環內部與外部循環的變量對結果IL沒有影響。見Is it better to declare a variable inside or outside a loop?

編輯：要添加到上面的評論，您可以使用不安全的方法。見https://stackoverflow.com/a/5375552/3290789和http://www.gutgames.com/post/Using-Unsafe-Code-for-Faster-Image-Manipulation.aspx

來源

2016-06-17 11:59:45 BigWheelRun

謝謝，但將相同的值分配給多個索引並不是我真正關心的問題。也許這個例子實際上太過於人爲的設想，因爲它似乎很容易讓人們失望。我編輯了這個問題以提供澄清，謝謝。 –

如果你很高興能完成第4個字節，這將是更快地Input複製到Output先用塊拷貝，然後不執行else條款的分支：

Buffer.BlockCopy(Input,0,Output,0,Input.Length); 
    for (int i = 0; i < Input.Length; i += 4) { 
     var brightness = (Input[i] + Input[i + 1] + Input[i + 2])/3;  
     if (brightness > Threshold) { 
      Output[i] = (byte)(255 - Input[i]); 
      Output[i + 1] = (byte)(255 - Input[i + 1]); 
      Output[i + 2] = (byte)(255 - Input[i + 2]); 
     } 
    }

來源

2016-06-17 12:42:06 spender

在大型數組中設置順序項子集的最高性能方法是什麼？

回答

相關問題