2012-06-14 64 views
7

我想寫兩個功能escape(text, delimiter)unescape(text, delimiter)具有以下屬性:什麼是逃避單個角色的最簡單算法?

  1. escape的結果不包含delimiter

  2. unescapeescape反向,即

    unescape(escape(text, delimiter), delimiter) == text 
    

    text所有的值和delimiter

它是行限制的delimiter允許值。


背景:我想創建值的分隔符分隔的字符串。爲了能夠再次從字符串中提取出相同的列表,我必須確保單獨的字符串不包含分隔符。


我已經試過:我想出了一個簡單的解決方案(僞代碼):

escape(text, delimiter): return text.Replace("\", "\\").Replace(delimiter, "\d") 
unescape(text, delimiter): return text.Replace("\d", delimiter).Replace("\\", "\") 

但發現財產2失敗的測試字符串"\d<delimiter>"。目前,我有以下工作液

escape(text, delimiter): return text.Replace("\", "\b").Replace(delimiter, "\d") 
unescape(text, delimiter): return text.Replace("\d", delimiter).Replace("\b", "\") 

這似乎工作,只要delimiter\bd(這是很好的,我不想使用那些作爲分隔符反正)。但是,由於我沒有正式證明其正確性,恐怕我錯過了其中一個屬性被侵犯的情況。由於這是一個常見問題,我假設已經有一個「衆所周知的證明正確」的算法,因此我的問題(見標題)。

+2

轉義?可靠,簡單和快速?大聲笑。 – Will

+1

可靠,簡單,快速轉義有什麼問題? –

回答

3

你的第一個算法是正確的。

的錯誤是在UNESCAPE()的實現:你需要通過delimiter\\通過\在同一通更換兩個\d。 你不能像這樣使用幾次調用Replace()。

下面是安全的報價分隔符分隔字符串的一些C#示例代碼:

static string QuoteSeparator(string str, 
     char separator, char quoteChar, char otherChar) // "~" -> "~~"  ";" -> "~s" 
    { 
     var sb = new StringBuilder(str.Length); 
     foreach (char c in str) 
     { 
      if (c == quoteChar) 
      { 
       sb.Append(quoteChar); 
       sb.Append(quoteChar); 
      } 
      else if (c == separator) 
      { 
       sb.Append(quoteChar); 
       sb.Append(otherChar); 
      } 
      else 
      { 
       sb.Append(c); 
      } 
     } 
     return sb.ToString(); // no separator in the result -> Join/Split is safe 
    } 
    static string UnquoteSeparator(string str, 
     char separator, char quoteChar, char otherChar) // "~~" -> "~"  "~s" -> ";" 
    { 
     var sb = new StringBuilder(str.Length); 
     bool isQuoted = false; 
     foreach (char c in str) 
     { 
      if (isQuoted) 
      { 
       if (c == otherChar) 
        sb.Append(separator); 
       else 
        sb.Append(c); 
       isQuoted = false; 
      } 
      else 
      { 
       if (c == quoteChar) 
        isQuoted = true; 
       else 
        sb.Append(c); 
      } 
     } 
     if (isQuoted) 
      throw new ArgumentException("input string is not correctly quoted"); 
     return sb.ToString(); // ";" are restored 
    } 

    /// <summary> 
    /// Encodes the given strings as a single string. 
    /// </summary> 
    /// <param name="input">The strings.</param> 
    /// <param name="separator">The separator.</param> 
    /// <param name="quoteChar">The quote char.</param> 
    /// <param name="otherChar">The other char.</param> 
    /// <returns></returns> 
    public static string QuoteAndJoin(this IEnumerable<string> input, 
     char separator = ';', char quoteChar = '~', char otherChar = 's') 
    { 
     CommonHelper.CheckNullReference(input, "input"); 
     if (separator == quoteChar || quoteChar == otherChar || separator == otherChar) 
      throw new ArgumentException("cannot quote: ambiguous format"); 
     return string.Join(new string(separator, 1), (from str in input select QuoteSeparator(str, separator, quoteChar, otherChar)).ToArray()); 
    } 

    /// <summary> 
    /// Decodes the strings encoded in a single string. 
    /// </summary> 
    /// <param name="encoded">The encoded.</param> 
    /// <param name="separator">The separator.</param> 
    /// <param name="quoteChar">The quote char.</param> 
    /// <param name="otherChar">The other char.</param> 
    /// <returns></returns> 
    public static IEnumerable<string> SplitAndUnquote(this string encoded, 
     char separator = ';', char quoteChar = '~', char otherChar = 's') 
    { 
     CommonHelper.CheckNullReference(encoded, "encoded"); 
     if (separator == quoteChar || quoteChar == otherChar || separator == otherChar) 
      throw new ArgumentException("cannot unquote: ambiguous format"); 
     return from s in encoded.Split(separator) select UnquoteSeparator(s, separator, quoteChar, otherChar); 
    } 
0

也許你可以擁有的情況下的替代更換時的分隔符確實開始與\bd。在unescape算法中使用相同的替代替換

相關問題