來自兩個以上字符串的最長公共子字符串 - C++

我需要從C++中的一組文件名中計算出最長的公共子字符串。來自兩個以上字符串的最長公共子字符串 - C++

準確地說，我的std ::字符串的std ::列表（或相當於QT，也未嘗不可）

char const *x[] = {"FirstFileWord.xls", "SecondFileBlue.xls", "ThirdFileWhite.xls", "ForthFileGreen.xls"}; 
std::list<std::string> files(x, x + sizeof(x)/sizeof(*x));

我需要計算n的所有字符串的不同最長公共子，在這例如對於n = 2

"File" and ".xls"

如果我能計算出最長公共子，我可以把它剪它，並再次運行算法得到第二長的，所以實際上這歸結爲：

有用於計算std :: strings的std :: list的LCS的（引用？）實現？

這不是一個很好的答案，但一個骯髒的解決方案，我有 - 蠻力上的QList QUrls從中僅將部分最後一個「/」取。我很樂意用「適當的」代碼替換它。

（我發現http://www.icir.org/christian/libstree/ - ？這將有很大的幫助，但我不能得到它來編譯我的機器上有人用這個可能）

QString SubstringMatching::getMatchPattern(QList<QUrl> urls) 
    { 
    QString a; 

    int foundPosition = -1; 
    int foundLength = -1; 
    for (int i=urls.first().toString().lastIndexOf("/")+1; i<urls.first().toString().length(); i++) 
    { 
     bool hit=true; 
     int xj; 
     for (int j=0; j<urls.first().toString().length()-i+1; j++) // try to match from position i up to the end of the string :: test character at pos. (i+j) 
     { 
      if (!hit) break; 

      QString firstString = urls.first().toString().right(urls.first().toString().length()-i).left(j); // this needs to match all k strings 
      //qDebug() << "SEARCH " << firstString; 

      for (int k=1; k<urls.length(); k++) // test all other strings, k = test string number 
      { 
       if (!hit) break; 

       //qDebug() << " IN " << urls.at(k).toString().right(urls.at(k).toString().length() - urls.at(k).toString().lastIndexOf("/")+1); 
       //qDebug() << " RES " << urls.at(k).toString().indexOf(firstString, urls.at(k).toString().lastIndexOf("/")+1); 
       if (urls.at(k).toString().indexOf(firstString, urls.at(k).toString().lastIndexOf("/")+1)<0) { 
        xj = j; 
        //qDebug() << "HIT LENGTH " << xj-1 << " : " << firstString; 
        hit = false; 
       } 
      } 

     } 
     if (hit) xj = urls.first().toString().length()-i+1; // hit up to the end of the string 
     if ((xj-2)>foundLength) // have longer match than existing, j=1 is match length 
     { 
      foundPosition = i; // at the current position 
      foundLength = xj-1; 
      //qDebug() << "Found at " << i << " length " << foundLength; 
     } 
    } 

    a = urls.first().toString().right(urls.first().toString().length()-foundPosition).left(foundLength); 
    //qDebug() << a; 
    return a; 
}

來源

2013-12-10 user2707001

這可能有用。 http://stackoverflow.com/questions/2418504/algorithm-to-find-common-substring-across-n-strings –

我已經點擊了數百個類似的問題沒有找到答案，包括上面的和http：// stackoverflow.com/questions/10248728/how-to-find-longest-common-substring-using-c。我得到的最接近的是http://homepage.virgin.net/cdm.henderson/site/cpp/lcs/index.htm，但是這是爲了子序列，而不是子串。 – user2707001

這是一個不平凡的問題。詳盡的搜索是必要的。（如果我的第一眼看好） –

如果像你說的後綴樹太重量級或以其他方式不切實際，以下相當簡單的蠻力方法可能足以滿足您的應用需求。

我假設不同的子串不重疊，從從左到右選取。

即使有了這些假設，也不需要包含一組字符串的唯一集合，其包括「不同的最長公共子串」。無論ñ是，可能有超過ň不同的共同子串都是一樣的最大長度和ñ他們中間的任何選擇是任意的。相應地，該解決方案找到最長的不同共同子串的最多個子集*，其中所有具有相同長度的那些子集是一個集合。

的算法如下：

Q是長度的目標配額。
字符串是字符串的問題集。
結果是最初爲空的多重映射，一個長度映射到一組串， 結果[1]是設置的與長度升
Ñ，最初爲0，是中代表的不同長度的數量結果
如果Q是0或字符串是空的返回結果
查找的字符串任何最短構件;保留副本S並將其刪除從字符串。我們繼續用S與的這些字符串比較子字符串，因爲所有{字符串，小號}的共同子串必須小號的子。

迭代生成的所有子串小號，最長首先，使用由偏移量和長度控制明顯嵌套循環。對於每個子SS 小號的：

如果SS不是字符串一個共同子串，下一個。

遍歷結果[1]爲升> =直到結果或直到SS端SS的長度被發現是被檢查結果的子串。在後一種情況下，ss與已有的結果沒有區別，所以接下來。

ss是不同於任何已有手段的常見子字符串。遍歷 結果[1]爲升 < SS的長度，刪除每個結果即SS的串，因爲所有這些都短於SS和不鮮明從它。 ss現在是一個不同於任何已有的常見子字符串和所有其他手頭上的所有其他人都不同於ss。

對於升 = SS的長度，檢查是否結果[1]存在，即如果有在手任何結果相同的長度SS。如果沒有，請致電 a NewLength條件。

檢查還如果ň == Q，即我們已經達到了鮮明長度的目標配額。如果NewLength獲得並且還有N == Q，請撥打StickOrRise條件。

如果StickOrRaise獲得然後用升 =所述長度在手最短結果的比較SS的長度。如果ss短於l 那麼它對我們的配額太短了，所以接下來。如果SS比升更長 然後在手所有最短結果是有利於SS待驅逐，所以刪除 結果[1]和遞減Ñ。

插入ss轉換成結果按其長度鍵。

如果NewLength獲得增量N。

放棄過小號的子串內部循環有SS偏移的相同，但更短，因爲它們都不是從SS不同。

進展所述的Offset S用於通過SS，到下一個非重疊子串的開始的長度的外迭代。

返回結果。

這裏是一個實現中，並用演示了字符串的列表的程序：

#include <list> #include <map> #include <string> #include <iostream> #include <algorithm> using namespace std; // Get a non-const iterator to the shortest string in a list list<string>::iterator shortest_of(list<string> & strings) { auto where = strings.end(); size_t min_len = size_t(-1); for (auto i = strings.begin(); i != strings.end(); ++i) { if (i->size() < min_len) { where = i; min_len = i->size(); } } return where; } // Say whether a string is a common substring of a list of strings bool is_common_substring_of( string const & candidate, list<string> const & strings) { for (string const & s : strings) { if (s.find(candidate) == string::npos) { return false; } } return true; } /* Get a multimap whose keys are the at-most `quota` greatest lengths of common substrings of the list of strings `strings`, each key multi-mapped to the set of common substrings of that length. */ multimap<size_t,string> n_longest_common_substring_sets(list<string> & strings, unsigned quota) { size_t nlengths = 0; multimap<size_t,string> results; if (quota == 0) { return results; } auto shortest_i = shortest_of(strings); if (shortest_i == strings.end()) { return results; } string shortest = *shortest_i; strings.erase(shortest_i); for (size_t start = 0; start < shortest.size();) { size_t skip = 1; for (size_t len = shortest.size(); len > 0; --len) { string subs = shortest.substr(start,len); if (!is_common_substring_of(subs,strings)) { continue; } auto i = results.lower_bound(subs.size()); for ( ;i != results.end() && i->second.find(subs) == string::npos; ++i) {} if (i != results.end()) { continue; } for (i = results.begin(); i != results.end() && i->first < subs.size();) { if (subs.find(i->second) != string::npos) { i = results.erase(i); } else { ++i; } } auto hint = results.lower_bound(subs.size()); bool new_len = hint == results.end() || hint->first != subs.size(); if (new_len && nlengths == quota) { size_t min_len = results.begin()->first; if (min_len > subs.size()) { continue; } results.erase(min_len); --nlengths; } nlengths += new_len; results.emplace_hint(hint,subs.size(),subs); len = 1; skip = subs.size(); } start += skip; } return results; } // Testing ... int main() { list<string> strings{ "OfBitWordFirstFileWordZ.xls", "SecondZWordBitWordOfFileBlue.xls", "ThirdFileZBitWordWhiteOfWord.xls", "WordFourthWordFileBitGreenZOf.xls"}; auto results = n_longest_common_substring_sets(strings,4); for (auto const & val : results) { cout << "length: " << val.first << ", substring: " << val.second << endl; } return 0; }

輸出：

length: 1, substring: Z length: 2, substring: Of length: 3, substring: Bit length: 4, substring: .xls length: 4, substring: File length: 4, substring: Word

（內置用gcc 4.8.1）

來源

2013-12-30 17:47:33

來自兩個以上字符串的最長公共子字符串 - C++

回答

相關問題