查找最長的串並從所有元素的數組中刪除它

我有這樣的陣列，例如（大小是可變的）：查找最長的串並從所有元素的數組中刪除它

x = ["10111", "10122", "10250", "10113"]

我需要找到最長的字符串是每個數組元素的子字符串（本例中爲「10」）（它不需要是字符串的前綴）。我必須從所有的字符串中刪除它。這個例子的輸出是：

x=["111","222","250","113"] //common value = "10"

來源

2013-09-23 whysai

最常見的子串是否應該在開頭？另外，這功課呢？ –

到目前爲止你有嘗試過什麼嗎？ –

瀏覽http://www.geeksforgeeks.org上的帖子 –

這個擴展發現的最長最常見的字符串（S）。請注意，"1"也包含在每個字符串中，甚至比"10"更多。（C＃只）：

public static class StringExtensions 
{ 
    public static IEnumerable<string> GetMostCommonSubstrings(this IList<string> strings) 
    { 
     if (strings == null) 
      throw new ArgumentNullException("strings"); 
     if (!strings.Any() || strings.Any(s => string.IsNullOrEmpty(s))) 
      throw new ArgumentException("None string must be empty", "strings"); 

     var allSubstrings = new List<List<string>>(); 
     for (int i = 0; i < strings.Count; i++) 
     { 
      var substrings = new List<string>(); 
      string str = strings[i]; 
      for (int c = 0; c < str.Length - 1; c++) 
      { 
       for (int cc = 1; c + cc <= str.Length; cc++) 
       { 
        string substr = str.Substring(c, cc); 
        if (allSubstrings.Count < 1 || allSubstrings.Last().Contains(substr)) 
         substrings.Add(substr); 
       } 
      } 
      allSubstrings.Add(substrings); 
     } 
     if (allSubstrings.Last().Any()) 
     { 
      var mostCommon = allSubstrings.Last() 
       .GroupBy(str => str) 
       .OrderByDescending(g => g.Key.Length) 
       .ThenByDescending(g => g.Count()) 
       .Select(g => g.Key); 
      return mostCommon; 
     } 
     return Enumerable.Empty<string>(); 
    } 
}

現在很容易：

string[] x = new[] { "10111", "10122", "10250", "10113" }; 
string mostCommonSubstring = x.GetMostCommonSubstrings().FirstOrDefault(); 
if (mostCommonSubstring != null) 
{ 
    for (int i = 0; i < x.Length; i++) 
     x[i] = x[i].Replace(mostCommonSubstring, ""); 
} 
Console.Write(string.Join(", ", x));

輸出：

111, 122, 250, 113

DEMO

編輯：如果你只是想用HashSet<string>找到最長的公共子不考慮發生的頻率考慮您可以使用此optimzed辦法（O（n）的操作）：

public static string GetLongestCommonSubstring(this IList<string> strings) 
{ 
    if (strings == null) 
     throw new ArgumentNullException("strings"); 
    if (!strings.Any() || strings.Any(s => string.IsNullOrEmpty(s))) 
     throw new ArgumentException("None string must be empty", "strings"); 

    var commonSubstrings = new HashSet<string>(strings[0].GetSubstrings()); 
    foreach (string str in strings.Skip(1)) 
    { 
     commonSubstrings.IntersectWith(str.GetSubstrings()); 
     if (commonSubstrings.Count == 0) 
      return null; 
    } 
    return commonSubstrings.OrderByDescending(s => s.Length).First(); 
} 

public static IEnumerable<string> GetSubstrings(this string str) 
{ 
    if (string.IsNullOrEmpty(str)) 
     throw new ArgumentException("str must not be null or empty", "str"); 

    for (int c = 0; c < str.Length - 1; c++) 
    { 
     for (int cc = 1; c + cc <= str.Length; cc++) 
     { 
      yield return str.Substring(c, cc); 
     } 
    } 
}

使用它在這方式：

string[] x = new[] { "101133110", "1", "102533010", "101331310" }; 
string longestCommon = x.GetLongestCommonSubstring(); // "10"

來源

2013-09-23 07:31:53

這個完美的作品，我想這是一個基本的算法，我們找到了字符串的子字符串組合，並檢查其餘的子字符串（消除不必要的）。謝謝。讓我知道你是否可以改進算法。截至目前這對我有用。 – whysai

@ user1785050：總是有改進的餘地。然而，直到性能不是真正的問題，我會保持它，因爲不必要的微型優化使代碼更少的可讀性和可維護性。 –

什麼是'allSubstrings.Last（）。包含（SUBSTR）'的'if'聲明的目的是什麼？只有在它們是最近收集的字符串的子字符串時，纔會收集字符串以供考慮。「111」，「222」，「222」會發生什麼？即使它們更常見，「222」字符串是不會被收集的？ –

試試這個：（我想普通字符串應該是開頭）：

string[] x = {"10111","10222","10250","10113"}; 
string common = x[0]; 
foreach(var i in x){ 
    while(!i.StartsWith(common)){ 
    common = common.Substring(0,common.Length-1); 
    if(common == "") break; 
    } 
} 
x = x.Select(a=>a.Substring(common.Length)).ToArray();

來源

2013-09-23 07:12:29

如果他們開始每次用10有了這個序列，例如它不工作這隻有：{「11110」，「12210」，「11210」，「22110」}; – Regu

@Regu **我說清楚，'我想普通字符串應該是在beginning' **，那是因爲OP沒有說清楚 –

查找出現長度爲1的子串的最大次數。這是一個簡單的O（n^2）搜索。調用發生K.

的這個最大數量在您的例子，這是「1」，「0」，以及K = 5。

現在你知道長度爲2的所有子不能出現在高於K個輸入字符串的更多。而且，發生K次的任何子串必須由發生K次的長度爲1的子串組成。搜索長度爲2的子字符串的長度爲1的子字符串存在K次，這又是一個簡單的O（n^2）搜索。

重複爲更長的長度，直至在K個輸入中不存在更多的子串。

來源

2013-09-23 09:25:56 DanielV

查找最長的串並從所有元素的數組中刪除它

回答

相關問題