給定原始字符串和編碼字符串，如何誘導編碼？

假設我有一個原始字符串和一個編碼字符串，如下所示：給定原始字符串和編碼字符串，如何誘導編碼？

「abcd」 - >「0010111111001010」，那麼一種可能的解決方案是「a」匹配「0010」，「b」匹配用「1111」，「c」與「1100」匹配，「d」與「1010」匹配。

如何編寫一個程序，給定這兩個字符串，並找出可能的編碼規則？

我的第一個臨時看起來是這樣的：

fun partition(orgl, encode) = 
let 
    val part = size(orgl) 
    fun porpt(str, i, len) = 
     if i = len - 1 then 
      [substring(str, len * (len - 1), size(str) - (len - 1) * len)] 
     else 
      substring(str, len * i, len)::porpt(str, i + 1, len) 
in 
    porpt(encode, 0, part) 
end;

但顯然它不能檢查兩個字符串是否匹配相同的字符，並且有比按比例分割字符串之外的許多其他的可能性。

這個問題應該是什麼適當的算法？

P.S.只允許前綴代碼。

我學到的東西還沒有真正陷入嚴重的算法，但我做了一些搜索有關回溯，寫我的代碼的第二個版本：

fun partition(orgl, encode) = 
let 
    val part = size(orgl) 
    fun backtrack(str, s, len, count, code) = 
     let 
      val current = 
       if count = 1 then 
        [email protected][substring(str, s, size(str) - s)] 
       else 
        [email protected][substring(str, s, len)] 
     in 
      if len > size(str) - s then [] 
      else 
       if proper_prefix(0, orgl, code) then 
        if count = 1 then current 
        else 
        backtrack(str, s + len, len, count - 1, current) 
       else 
       backtrack(str, s, len + 1, count, code) 
     end 
in 
    backtrack(encode, 0, 1, part, []) 
end;

凡功能proper_prefix將檢查前綴代碼和獨特的映射。但是，此功能無法正常工作。

例如，當我輸入：

partition("abcd", "001111110101101");

返回的結果是：

uncaught exception Subscript

僅供參考，proper_prefix的身體看起來是這樣的：

fun proper_prefix(i, orgl, nil) = true 
    | proper_prefix(i, orgl, x::xs) = 
    let 
     fun check(j, str, nil) = true 
     | check(j, str, x::xs) = 
      if String.isPrefix str x then 
      if str = x andalso substring(orgl, i, 1) = substring(orgl, i + j + 1, 1) then 
       check(j + 1, str, xs) 
      else 
       false 
      else 
      check(j + 1, str, xs) 
    in 
     if check(0, x, xs) then proper_prefix(i + 1, orgl, xs) 
     else false 
    end;

來源

2015-11-16 Isabel Yang

你認爲這是[替代密碼]（https://en.wikipedia.org/wiki/Substitution_cipher）嗎？你認爲所有的比特串有相同的長度嗎？你認爲所有的字符映射到一個唯一的位串？這些假設極大地影響了這個問題，並因此影響了任何確定翻譯的算法。 –

是的。這是一個替代密碼，它可能不具有相同的長度 - 這就是爲什麼我說我的第一個劃痕是可怕的，因爲它假定角色具有相同的長度。是的，映射應該是內射的。 –

我會嘗試一種回溯方法：

從一個空的假設開始（即將所有編碼設置爲未知）。然後逐個字符地處理編碼的字符串。

在每個新代碼字符中，您有兩個選項：將代碼字符附加到當前源字符的編碼或轉到下一個源字符。如果遇到已經有編碼的源字符，請檢查它是否匹配並繼續。或者如果它不匹配，請返回並嘗試另一個選項。您也可以在遍歷期間檢查前綴屬性。

你的示例輸入可以如下進行處理：

Assume 'a' == '0' 
Go to next source character 
Assume 'b' == '0' 
Violation of prefix property, go back 
Assume 'a' == '00' 
Go to next source character 
Assume 'b' == '1' 
...

此探索所有可能的編碼的範圍內。您可以返回找到的第一個編碼或所有可能的編碼。

來源

2015-11-16 15:27:30

如果一個人天真地重複所有可能的翻譯abcd→0010111111001010，這可能導致爆炸。簡單的迭代似乎也導致大量無效的翻譯之一將不得不跳過：

(a, b, c, d) → (0, 0, 1, 0111111001010) is invalid because a = b 
(a, b, c, d) → (0, 0, 10, 111111001010) is invalid because a = b 
(a, b, c, d) → (0, 01, 0, 111111001010) is invalid because a = c 
(a, b, c, d) → (00, 1, 0, 111111001010) is one possibility 
(a, b, c, d) → (0, 0, 101, 11111001010) is invalid because a = b 
(a, b, c, d) → (0, 010, 1, 11111001010) is another possibility 
(a, b, c, d) → (001, 0, 1, 11111001010) is another possibility 
(a, b, c, d) → (0, 01, 01, 11111001010) is invalid because b = c 
(a, b, c, d) → (00, 1, 01, 11111001010) is another possibility 
(a, b, c, d) → (00, 10, 1, 11111001010) is another possibility 
...

如果所有字符串包含每個字符只出現一次，那麼結果這個吹起來就是答案。如果同一個角色出現不止一次，這進一步限制瞭解決方案。例如。匹配ABCA→111011可以產生

(a, b, c, a) → (1, 1, 1, 011) is invalid because a = b = c, a ≠ a 
(a, b, c, a) → (1, 1, 10, 11) is invalid because a = b, a ≠ a 
(a, b, c, a) → (1, 11, 0, 11) is invalid because a = b, a ≠ a 
(a, b, c, a) → (11, 1, 0, 11) is one possibility 
... (all remaining combinations would eventually prove invalid)

對於給定的假設，你可以選擇在其中驗證約束的順序。要麼

看看是否有任何映射重疊。（我認爲這是Nico所稱的前綴屬性。）
查看是否有任何字符在位字符串中的兩個位置都發生多次。

使用此搜索策略的算法必須找到檢查約束的順序，以儘快嘗試假設。我的直覺告訴我，如果位串β很長，並且它發生很多次，那麼約束條件a→β是值得研究的。

另一個策略是排除特定字符可以映射到特定長度/上/下的任何位串。例如，AAAB→1111110規則出一個映射到長度爲2以上的任意位串，並且abcab→1011101規則出一個映射到長度不同的任何位串大於2

爲編程部分，嘗試並思考如何表示假設。例如。

(* For the hypothesis (a, b, c, a) → (11, 1, 0, 11) *) 

(* Order signifies first occurrence *) 
val someHyp1 = ([(#"a", 2), (#"b", 1), (#"c", 1)], "abca", "111011") 

(* Somehow recurse over hypothesis and accumulate offsets for each character, e.g. *) 
val someHyp2 = ([(#"a", 2), (#"b", 1), (#"c", 1)], 
       [(#"a", 0), (#"b", 2), (#"c", 3), (#"a", 4)])

然後生成一個按某種順序生成新假設的函數，以及一個用於查找假設是否有效的函數。

fun nextHypothesis (hyp, origStr, encStr) = ... (* should probably return SOME/NONE *) 
fun validHypothesis (hyp, origStr, encStr) = 
    allStr (fn (i, c) => (* is bit string for c at its 
          accumulated offset in encStr? *)) origStr 

(* Helper function that checks whether a predicate is true for each 
    character in a string. The predicate function takes both the index 
    and the character as argument. *) 
and allStr p s = 
    let val len = size s 
     fun loop i = i >= len orelse p (i, String.sub (s, i)) andalso loop (i+1) 
    in loop 0 end

在這個框架內得以改善，將是改變在探索假設的順序，因爲一些搜索路徑可以排除更大量的無效映射比別人的。

來源

2015-11-18 00:20:37

給定原始字符串和編碼字符串，如何誘導編碼？

回答

相關問題