我有一個要求,基本上是這樣的。如果我有一串文字,如使用正則表達式替換時尋找未轉義的字符
"There once was an 'ugly' duckling but it could
never have been \'Scarlett\' Johansen"
然後我想匹配尚未轉義的引號。這些將是'醜陋'周圍的那些,而不是圍繞'思嘉'的那些。
我已經花了很長時間在這個使用一個小小的C#控制檯應用程序來測試的東西,並提出了以下解決方案。
private static void RegexFunAndGames() {
string result;
string sampleText = @"Mr. Grant and Ms. Kelly starred in the film \'To Catch A Thief' but not in 'Stardust' because they'd stopped acting by then";
string rePattern = @"\\'";
string replaceWith = "'";
Console.WriteLine(sampleText);
Regex regEx = new Regex(rePattern);
result = regEx.Replace(sampleText, replaceWith);
result = result.Replace("'", @"\'");
Console.WriteLine(result);
}
基本上我所做的是兩個步驟的過程中找到那些已經逃過的角色,撤消然後再做一切。這聽起來有點笨拙,我覺得可能有更好的辦法。
測試信息
我有兩個真正的好答案,所以我認爲這值得運行一個測試,看看它運行更好。我有這兩個功能:
private static string RegexReplace(string sampleText) {
Regex regEx = new Regex("(?<!\\\\)'");
return regEx.Replace(sampleText, "\\'");
}
private static string ReplaceTest(string sampleText) {
return sampleText.Replace(@"\'", "'").Replace("'", @"\'");
}
而且我在一個控制檯應用程序調用它們在通過Main方法:
static void Main(string[] args) {
string sampleText = @"Mr. Grant and Ms. Kelly starred in the film \'To Catch A Thief' but not in 'Stardust' because they'd stopped acting by then.";
string testReplace = string.Empty;
System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
sw.Start();
for (int i = 1000000; i > 0; i--) {
testReplace = ReplaceTest(sampleText);
}
sw.Stop();
Console.WriteLine("This method took '" + sw.ElapsedMilliseconds.ToString() + "'");
sw.Reset();
sw.Start();
for (int i = 1000000; i > 0; i--) {
testReplace = RegexReplace(sampleText);
}
sw.Stop();
Console.WriteLine("This method took '" + sw.ElapsedMilliseconds.ToString() + "'");
}
的方法ReplaceTest需要2068毫秒。 RegexReplace方法需要9372毫秒。我已經跑了幾次這個測試,並且ReplaceTest總是出來最快。
我開始着眼於'消極的後顧之憂',但顯然看起來並不夠硬。我會拭目以待,看是否有其他回覆,以防萬一,但我可能會將此標記爲我接受的答案。謝謝。 –
有沒有簡單的方法來確保這個不會破壞,如果引號前的斜線本身是用斜線轉義的呢? – Rawling
@Rawling Sure - 使用'new Regex(「(?<!(!<!\\\\)\\\\)'」)'來表示斜槓本身不能以斜槓開頭。 – dasblinkenlight