要做到這一點的安全方法是使用有狀態的UTF8解碼器,該解碼器可以從Encoding.UTF8.GetDecoder()
獲得。
有狀態解碼器將在內部保存對應於不完整的多字節序列的字節。下次給它更多的字節時,它將完成序列並返回從序列中解碼出的字符。
下面是如何使用它的一個例子。在我的實現中,我使用了一個char[]
緩衝區,其大小足以保證我們有足夠的空間來存儲X字節的完整轉換。這樣,我們只執行兩次內存分配來讀取整個流。
public static string ReadStringFromStream(Stream stream)
{
// --- Byte-oriented state ---
// A nice big buffer for us to use to read from the stream.
byte[] byteBuffer = new byte[8192];
// --- Char-oriented state ---
// Gets a stateful UTF8 decoder that holds onto unused bytes when multi-byte sequences
// are split across multiple byte buffers.
var decoder = Encoding.UTF8.GetDecoder();
// Initialize a char buffer, and make it large enough that it will be able to fit
// a full reads-worth of data from the byte buffer without needing to be resized.
char[] charBuffer = new char[Encoding.UTF8.GetMaxCharCount(byteBuffer.Length)];
// --- Output ---
StringBuilder stringBuilder = new StringBuilder();
// --- Working state ---
int bytesRead;
int charsConverted;
bool lastRead = false;
do
{
// Read a chunk of bytes from our stream.
bytesRead = stream.Read(byteBuffer, 0, byteBuffer.Length);
// If we read 0 bytes, we hit the end of stream.
// We're going to tell the converter to flush, and then we're going to stop.
lastRead = (bytesRead == 0);
// Convert the bytes into characters, flushing if this is our last conversion.
charsConverted = decoder.GetChars(
byteBuffer,
0,
bytesRead,
charBuffer,
0,
lastRead
);
// Build up a string in a character buffer.
stringBuilder.Append(charBuffer, 0, charsConverted);
}
while(lastRead == false);
return stringBuilder.ToString();
}
任何將代碼點分割爲8192字節的邊界將失敗,是的。爲什麼要以UTF-8解碼才能立即重新編碼? – Ryan
不,它不安全。更好的方法是'accumulator = new StreamReader(stream,Encoding.UTF8).ReadToEnd()' –