如何用變量和標記實現馬爾科夫算法？

我一直在試圖實現馬爾可夫的算法，但我只取得了部分成功。該算法非常簡單，可以找到here。如何用變量和標記實現馬爾科夫算法？

但是，我的項目有一個額外的困難，我必須使用包含標記和變量的規則。

一個變量表示字母表中的任何字母，而標記只是一個用作移動變量（它沒有真實值）的引用的字符。

此示例複製字符串中的每個字符：

字母：{A，B，C}

標記：{M}

變量：{X}

規則1：Mx - > xxM

規則2：xM - > x

規則3：X - >的Mx

輸入：ABC

ABC //我們應用規則3

MABC //我們應用規則1

aaMbc //我們應用規則1

aabbMc //我們應用規則1

aabbccM //我們應用規則2

爲aabbcc

這是我實現了一個馬爾科夫算法，只有例如字符串輸入工作遞歸函數：第1條：「蘋果」 - >「橙色」，輸入：「蘋果」。

public static String markov(String input, LinkedList<Rule> rules) { 
    for (Rule rule : rules) { 
     if (!input.equals(input.replace(rule.getFrom(), rule.getTo()))) { //If the rule matches a substring 
      if (rule.isTerminating()) { //If the rule is terminating 
       input = input.replaceFirst(Pattern.quote(rule.getFrom()), rule.getTo()); 
       System.out.println(input); //Replace the first instance 
       return input; //return and end the cycle 
      } else { 
       input = input.replaceFirst(Pattern.quote(rule.getFrom()), rule.getTo()); 
       System.out.println(input); 
       return markov(input, rules); //Start looking again for matching rules 
      } 
     } 
    } 
    return input; 
}

我無法弄清楚如何實現變量和標記這個邏輯，所以也許有人能教我在執行這一邏輯的最好方法？歡迎任何建議。

如果問題不符合SO準則，請讓我知道爲什麼在評論中，所以我不重複這個錯誤。

謝謝！

GitHub

來源

2015-10-04 Riccardo

我認爲這樣做最簡單的方法是使用Java正則表達式。一旦你讓你的頭圍繞這些，那麼下面的規則應該爲你的工作，例如：

Rule 1: "M([a-c])" -> "$1$1M" 
Rule 2: "([a-c])M" -> "$1" (terminating) 
Rule 3: "([a-c])" -> "M$1"

注意，你需要幾個調整到當前的方法來完成這項工作？

replace需要一個文本字符串，因爲它是第一個參數，而replaceFirst使用正則表達式，所以：

replace: if (!input.equals(input.replace(rule.getFrom(), rule.getTo()))) { 
with: if (!input.equals(input.replaceFirst(rule.getFrom(), rule.getTo()))) {

你引述rule.getFrom()字符串，它不會使用正則表達式的工作，所以：

replace: input = input.replaceFirst(Pattern.quote(rule.getFrom()), rule.getTo()); 
with: input = input.replaceFirst(rule.getFrom(), rule.getTo());

此時，在代碼中調用replaceFirst兩次的代碼有點重複，所以您可以在第一次將其保存在臨時變量中並重新使用它：

String next = input.replace(rule.getFrom(), rule.getTo()); 
if (!input.equals(next)) { 
    ... 
    input = next; 
    ... 
}

由於您目前正在引用整個rule.getFrom()字符串，我猜你在正式表達式特殊字符在這之前有問題。如果是這樣，您在創建規則時需要分別解決它們。我真的不想在這裏進入正則表達式，因爲它是一個巨大的領域，並且與馬爾可夫算法完全分離，所以如果你遇到這些問題，請在線進行一些研究（例如Regular Expressions和Capturing Groups），或者詢問這裏有一個單獨的問題，側重於正則表達式特定問題。

請注意，您仍然可以與正常的規則，以便將這些（改變標記字符從M到#允許M在字母表中使用），這些規則：

"A"    -> "apple" 
"B"    -> "bag" 
"S"    -> "shop" 
"T"    -> "the" 
"the shop"  -> "my brother" 
"#([a-zA-Z .])" -> "$1$1#" 
"([a-zA-Z .])#" -> "$1" (terminating) 
"([a-zA-Z .])" -> "#$1"

會轉換：

from: I bought a B of As from T S. 
to: II bboouugghhtt aa bbaagg ooff aapppplleess ffrroomm mmyy bbrrootthheerr..

希望這會有所幫助。

來源

2015-10-05 04:35:25 Barney

如何用變量和標記實現馬爾科夫算法？

回答

相關問題