最長的子字符串排除字符串列表

-1

我使用this算法來查找2個字符串之間的公共子字符串。請幫助我做到這一點，但使用Array該字符串的常見子字符串，我應該忽略我的函數。最長的子字符串排除字符串列表

我的代碼在Java中：

public static String longestSubstring(String str1, String str2) { 

     StringBuilder sb = new StringBuilder(); 
     if (str1 == null || str1.isEmpty() || str2 == null || str2.isEmpty()) { 
      return ""; 
     } 

     // java initializes them already with 0 
     int[][] num = new int[str1.length()][str2.length()]; 
     int maxlen = 0; 
     int lastSubsBegin = 0; 

     for (int i = 0; i < str1.length(); i++) { 
      for (int j = 0; j < str2.length(); j++) { 
       if (str1.charAt(i) == str2.charAt(j)) { 
        if ((i == 0) || (j == 0)) { 
         num[i][j] = 1; 
        } else { 
         num[i][j] = 1 + num[i - 1][j - 1]; 
        } 

        if (num[i][j] > maxlen) { 
         maxlen = num[i][j]; 
         // generate substring from str1 => i 
         int thisSubsBegin = i - num[i][j] + 1; 
         if (lastSubsBegin == thisSubsBegin) { 
          //if the current LCS is the same as the last time this block ran 
          sb.append(str1.charAt(i)); 
         } else { 
          //this block resets the string builder if a different LCS is found 
          lastSubsBegin = thisSubsBegin; 
          sb = new StringBuilder(); 
          sb.append(str1.substring(lastSubsBegin, i + 1)); 
         } 
        } 
       } 
      } 
     } 

     return sb.toString(); 
    }

所以，我的功能應該是這樣的：

public static String longestSubstring(String str1, String str2, String[] ignore)

來源

2013-09-24 Yuriy Mayorov

您目前面臨的解決方案有哪些問題？似乎對我來說代碼很好。 – Meesh

代碼沒有問題。閱讀最後一條語句 – Prateek

附註：在許多情況下，您忽略的一組字符串（停用詞）最好存儲在散列表/字典數據結構中。這是因爲如果每次都必須遍歷它，大量被忽略的單詞會削弱您的算法。我對你的算法的建議是構建這個HashMap，然後在你的循環的深度，當你生成子串時，ping這個單詞以查看它是否存在於被忽略的單詞Hash中，並且只有當它不存在時才添加它。 – DRobinson

據我瞭解，你要忽略那些包含至少一個串子ignore。

if (str1.charAt(i) == str2.charAt(j)) { 
    if ((i == 0) || (j == 0)) { 
     num[i][j] = 1; 
    } else { 
     num[i][j] = 1 + num[i - 1][j - 1]; 
    } 


    // we must update `sb` on every step so that we can compare it with `ignore` 
    int thisSubsBegin = i - num[i][j] + 1; 
    if (lastSubsBegin == thisSubsBegin) { 
     sb.append(str1.charAt(i)); 
    } else { 
     lastSubsBegin = thisSubsBegin; 
     sb = new StringBuilder(); 
     sb.append(str1.substring(lastSubsBegin, i + 1)); 
    } 

    // check whether current substring contains any string from `ignore`, 
    // and if it does, find the longest one 
    int biggestIndex = -1; 
    for (String s : ignore) { 
     int startIndex = sb.lastIndexOf(s); 
     if (startIndex > biggestIndex) { 
      biggestIndex = startIndex;  
     } 
    }  

    //Then sb.substring(biggestIndex + 1) will not contain strings to be ignored 
    sb = sb.substring(biggestIndex + 1); 
    num[i][j] -= (biggestIndex + 1); 

    if (num[i][j] > maxlen) { 
     maxlen = num[i][j]; 
    } 
}

如果忽略那些正是一樣ignore任何串子，然後當最長公共子候選發現，迭代ignore，並檢查是否有電流子它。

來源

2013-09-24 23:24:37

爲其中一個字符串創建一個後綴樹，並運行第二個字符串以查看哪個子字符串可以在後綴樹中找到。

關於後綴樹的信息：http://en.wikipedia.org/wiki/Suffixtree

來源

2013-09-25 08:52:13

最長的子字符串排除字符串列表

回答

相關問題