您可以爲StreamReader添加一個額外的方法(例如,單可以被用於該目的):
private StringBuilder lineBuilder;
public int RegexBufferSize
{
set { lastRegexMatchedLength = value; }
get { return lastRegexMatchedLength; }
}
private int lastRegexMatchedLength = 0;
public virtual string ReadRegex(Regex regex)
{
if (base_stream == null)
throw new ObjectDisposedException("StreamReader", "Cannot read from a closed RegexStreamReader");
if (pos >= decoded_count && ReadBuffer() == 0)
return null; // EOF Reached
if (lineBuilder == null)
lineBuilder = new StringBuilder();
else
lineBuilder.Length = 0;
lineBuilder.Append(decoded_buffer, pos, decoded_count - pos);
int bytesRead = ReadBuffer();
bool dataTested = false;
while (bytesRead > 0)
{
var lineBuilderStartLen = lineBuilder.Length;
dataTested = false;
lineBuilder.Append(decoded_buffer, 0, bytesRead);
if (lineBuilder.Length >= lastRegexMatchedLength)
{
var currentBuf = lineBuilder.ToString();
var match = regex.Match(currentBuf, 0, currentBuf.Length);
if (match.Success)
{
var offset = match.Index + match.Length;
pos = 0;
decoded_count = lineBuilder.Length - offset;
ensureMinDecodedBufLen(decoded_count);
lineBuilder.CopyTo(offset, decoded_buffer, 0, decoded_count);
var matchedString = currentBuf.Substring(match.Index, match.Length);
return matchedString;
}
else
{
lastRegexMatchedLength *= (int) 1.1; // allow for more space before attempting to match
dataTested = true;
}
}
bytesRead = ReadBuffer();
}
// EOF reached
if (!dataTested)
{
var currentBuf = lineBuilder.ToString();
var match = regex.Match(currentBuf, 0, currentBuf.Length);
if (match.Success)
{
var offset = match.Index + match.Length;
pos = 0;
decoded_count = lineBuilder.Length - offset;
ensureMinDecodedBufLen(decoded_count);
lineBuilder.CopyTo(offset, decoded_buffer, 0, decoded_count);
var matchedString = currentBuf.Substring(match.Index, match.Length);
return matchedString;
}
}
pos = decoded_count;
return null;
}
在上述方法中,下面的VARS被使用:
- decoded_buffer:包含/將包含數據讀
- POS炭緩衝:包含未處理數據的陣列中的偏移量
- decoded_count:包含讀取數據的緩衝區中的最後一個元素
- RegexBufferSize:最大大小的正則表達式輸入之前匹配發生。
方法ReadBuffer()需要從流讀取數據。 方法ensureMinDecodedBufLen()需要確保decode_buffer足夠大。
當調用該方法中,通過需要針對要匹配的正則表達式。
那麼,爲什麼你不能等到你收到的所有數據? – ChaosPandion 2009-12-25 23:33:05
根據我的經驗,通常會阻礙性能的正則表達式,而不是轉換和GC'ing字符串。除非您的匹配非常複雜,否則我建議您爲匹配而不是正則表達式創建您自己的流掃描器。 但是,您應該使用正則表達式對其進行基準測試,以確保您處於正確的軌道上。 – 2009-12-26 00:04:41
@ChaosPandion:如果流是一個大文件,我不會將所有的gigas加載到內存中,尤其是不在utf-16中(在.net中的內存中的字符串編碼)。 如果流來自互聯網,我希望能夠在接收到的所有數據(IE HTML解析器,在下載頁面的其餘部分之前顯示下載的部分)之前掃描它。 – DxCK 2009-12-26 00:09:09