在字符串中刪除的圖案重複

比方說，我有以下string：在字符串中刪除的圖案重複

aabccd

我想找到並刪除重複的所有模式。在這個例子中，一個a後面是另一個，這是一個重複。 cc。

最終字符串將是：

bd

又如：香蕉 =>BA（因爲有兩個一個或兩個吶）。

這裏的算法我想出來的：

分割成兩半串並採取最大的一個

比方說，它具有長度爲4;我看着另一半，如果它的存在

下一個循環的轉彎，我在我的字符串從一個字符（但仍然長度4），直到我循環轉移所有可能的模式

然後，我通過減少一半一個字符（所以長度爲3），並開始再次

//等

它會做這樣的事情：

個helloguys - >地獄 - >ELLO - >llog

而且一些代碼，不工作，因爲它應該：

std::string processWord(std::string word) 
{ 
    // Split word in the two halves 
    std::string firstHalf = word.substr(0, word.size()/2); 
    std::string secondHalf = word.substr(firstHalf.size()); 

    // Check if the tiniest half is present in the biggest one (firstHalf will be eitheir lower or equal to secondHalf) 
    if (secondHalf.find(firstHalf) != std::string::npos) 
    { 
     std::cout << firstHalf << " is found in " << secondHalf << std::endl; 
     // Remove firstHalf from secondHalf 
     word.replace(word.find(firstHalf), firstHalf.size(), ""); 
     std::cout << word << std::endl; 
     return word; 
    } 

    for (size_t i = 1; i < secondHalf.size(); ++i) 
    { 
     // Get secondHalf minus one character at each loop turn to check occurences 
     std::string occurence = secondHalf.substr(0, secondHalf.size() - i); 
     // Mark the first occurence 
     size_t startIndex = indexOf(word, occurence); 
     // Mark the last occurence 
     size_t lastIndex = startIndex; 
     int totalOccurences = 1; 

     // As long as there are occurences following each other, we continue 
     // Example: "anandgdfgd". We see "an" twice. But it would not work with "anshfsduihsuan" since they are not following each other 
     while (word.find(occurence, lastIndex + occurence.size()) != std::string::npos) 
     { 
      lastIndex += occurence.size(); 
      totalOccurences++; 
     } 

     std::ostringstream oss; 
     oss << "[" << totalOccurences << "*" << occurence << "]"; 

     word.replace(startIndex, lastIndex, oss.str()); 
    } 
    // Some code is missing here 
    return word; 
}

我敢肯定，這個問題已經得到解決，但我找不到任何東西。有小費嗎？

來源

2015-11-02 Grégoire Borel

如何處理'baaacd'，'baanan'和'baab'？第一個包含3個字符，在第二個刪除雙'a'不包括刪除'anan'，在第三個刪除'aa'導致'bb'對 - 它是否也應該刪除？ – MBo

@MBo：很好的問題。第一個會刪除三個'a'。第二個依賴於算法，它只會執行兩個解決方案之一。第三個答案相同。我認爲一掃就夠了。 –

沒有至少一些限制，這無疑是不切實際的。至少有一些限制，它幾乎要求LZ *壓縮。 –

清除規則尚未定義，但我認爲建築Suffix Array是一個很好的起點。您可以比較相鄰後綴，查找常見起始部分的長度並做出決定 - 哪些子字符串是最適合刪除的候選字符。

來源

2015-11-02 19:44:00 MBo

在字符串中刪除的圖案重複

回答

相關問題