將字節轉換爲UTF8編碼的字符串安全嗎？

今天我看到這樣的代碼問題：將字節轉換爲UTF8編碼的字符串安全嗎？

var accumulator = ""; 
var buffer = new byte[8192]; 
while (true) 
{ 
    var readed = stream.Read(buffer, 0, buffer.Length); 
    accumulator += Encoding.UTF8.GetString(buffer, 0, readed); 
    if (readed < buffer.Length) 
     break; 
} 
var result = Encoding.UTF8.GetBytes(accumulator);

我知道這個代碼是低效的，但確實它的安全？是否有一些字節序列會影響結果？

來源

2017-07-23 Aleks Andreev

任何將代碼點分割爲8192字節的邊界將失敗，是的。爲什麼要以UTF-8解碼才能立即重新編碼？ – Ryan

不，它不安全。更好的方法是'accumulator = new StreamReader（stream，Encoding.UTF8）.ReadToEnd（）' –

這段代碼顯然是壞的;如果這是建議作爲答案，那麼你應該提請作者注意這個錯誤。

UTF-8序列顯然可以多於一個字節。如果有一個多字節序列從當前緩衝區的末尾開始並在下一個緩衝區的開始處重新開始，那麼每個緩衝區轉換爲一個字符串將是錯誤的。

來源

2017-07-23 20:25:20

「被建議作爲答案」 - 不，這個代碼來自問題。從你的回答中，我明白了這種方法可能存在的一個錯誤謝謝 –

要做到這一點的安全方法是使用有狀態的UTF8解碼器，該解碼器可以從Encoding.UTF8.GetDecoder()獲得。

有狀態解碼器將在內部保存對應於不完整的多字節序列的字節。下次給它更多的字節時，它將完成序列並返回從序列中解碼出的字符。

下面是如何使用它的一個例子。在我的實現中，我使用了一個char[]緩衝區，其大小足以保證我們有足夠的空間來存儲X字節的完整轉換。這樣，我們只執行兩次內存分配來讀取整個流。

public static string ReadStringFromStream(Stream stream) 
{ 
    // --- Byte-oriented state --- 
    // A nice big buffer for us to use to read from the stream. 
    byte[] byteBuffer = new byte[8192]; 

    // --- Char-oriented state --- 
    // Gets a stateful UTF8 decoder that holds onto unused bytes when multi-byte sequences 
    // are split across multiple byte buffers. 
    var decoder = Encoding.UTF8.GetDecoder(); 

    // Initialize a char buffer, and make it large enough that it will be able to fit 
    // a full reads-worth of data from the byte buffer without needing to be resized. 
    char[] charBuffer = new char[Encoding.UTF8.GetMaxCharCount(byteBuffer.Length)]; 

    // --- Output --- 
    StringBuilder stringBuilder = new StringBuilder(); 

    // --- Working state --- 
    int bytesRead; 
    int charsConverted; 
    bool lastRead = false; 

    do 
    { 
     // Read a chunk of bytes from our stream. 
     bytesRead = stream.Read(byteBuffer, 0, byteBuffer.Length); 

     // If we read 0 bytes, we hit the end of stream. 
     // We're going to tell the converter to flush, and then we're going to stop. 
     lastRead = (bytesRead == 0); 

     // Convert the bytes into characters, flushing if this is our last conversion. 
     charsConverted = decoder.GetChars( 
      byteBuffer, 
      0, 
      bytesRead, 
      charBuffer, 
      0, 
      lastRead 
     ); 

     // Build up a string in a character buffer. 
     stringBuilder.Append(charBuffer, 0, charsConverted); 
    } 
    while(lastRead == false); 

    return stringBuilder.ToString(); 
}

來源

2017-07-23 20:48:45 antiduh

沒有必要重新發明輪子（假設它正在工作），請參閱「LB」的評論 – EZI

@EZI - 當然，但這表明瞭如何自己做，因此，給你一些你可以適應你的情況如果你不想閱讀，直到流的結束或有其他不同的要求。每隔一段時間拉一下窗簾一點也沒有錯。 – antiduh

將字節轉換爲UTF8編碼的字符串安全嗎？

回答

相關問題