2011-06-21 60 views
3

我想兩個字符串如何使用C#或jquery匹配段落中的重複單詞?

// C# code 
string str1,str2; 

我要檢查五個字在字符串str1中前十五字

str1="Musician Ben Sollee on the Ravages of Coal and the Wonders of the Bicycle" 
str2="There is Wonders of Musician Ben Sollee on the Ravages of Coal" 

我想跳過上面串像動詞對比之間的匹配常用詞「的」,「上」,「該」等..只檢查沒有動詞的詞...

從上面的字符串我想比較st R2STR1如果它包含五個重複單詞,然後給郵件包含一些重複的話..

我如何比較,並檢查它包含重複。我很滿意jQuery或C#的答案。

回答

2

單線Linq查詢獎金p oints?

string[] str1Words = ... 
string[] str2Words = ... 
string[] dontCheck = {"of", "a", "the"}; 

var greaterThanFive = str1Words.Join(str2Words, s1 => s1, s2 => s2, (r1, r2) => r1) 
           .Distinct() 
           .Where(s => !dontCheck.Contains(s)) 
           .Count() > 5; 
+0

感謝很多@ColinE爲偉大的答案.. –

+0

任何.NET 3.5語言將支持這個。只需包含system.linq – ColinE

+0

Hello @ColinE,如果我設置了字符串str1Words =「官方Facebook iPad App即將推出App Store官方Facebook iPad App即將推出App Store #use」;和字符串[] str2Words =「應用程序即將推出官方Facebook iPad商店應用程序官方」,那麼它返回超過5個計數.. –

1

您可以嘗試通過拆分「」將兩個字符串拆分爲單詞列表。 然後只需遍歷指定的單詞並檢查第二個列表是否包含字符串。

您還應該保留一個列表或文件與被忽略的單詞。

 List<string> str1List = new List<string>(str1.Split(' ')); 
     List<string> str2List = new List<string>(str2.Split(' ')); 

     foreach (string word in str1List) 
     { 
      if (str2List.Contains(word)) 
      { 
       //do something 
      } 
     } 
+0

感謝@Mithir爲解決 –

+0

沒有問題... ...猜猜我是有點過時... :( – Mithir

2

每個字符串獲取的話:

string[] str1Words = str1.split(" "); 

string[] str2Words = str2.split(" "); 

指定你不想來檢查的話:

string[] dontCheck = {"of", "a", "the"}; // etc.. 

,看看有多少重複的有:

string[] duplicates = Array.FindAll(
    str1Words, srt1word => 
     Array.Exists(str2Words, str1Word => string.Equals(str1word, str2word)) 
     && !Array.Exists(dontCheck, dontCheckWord => string.Equals(dontCheckWord, str1Word)) 
); 

if(duplicates.length > 5) 
{ 
    // Give message 
} 
+0

你好@ lockstock謝謝,我想嘗試但Array.Contains不存在..什麼會問題? –

+0

非常感謝。vu :) –

1
 string str1 = "Musician Ben Sollee on the Ravages of Coal and the Wonders of the Bicycle";   
     string str2="There is Wonders of Musician Ben Sollee on the Ravages of Coal"; 
     string[] DontCheck = new string[]{"is", "of", "the"}; 

     List<string> List1 = new List<string>(str1.Split(' ')); 
     List<string> List2 = new List<string>(str2.Split(' ')); 


     var Result = ((from s in List1 
         where List2.Contains(s) && !DontCheck.Contains(s) 
         select s).Count() > 5); 

     if (Result) 
     { 
      //It Contains some duplicate words 
     } 

更新的代碼爲不同的重複檢查

 string str1 = "Official Facebook iPad App Coming Soon to the App Store Official Facebook iPad"; 
     string str2 = "App Coming Soon to Official Facebook iPad the Store App Official App App App App"; 

     string[] DontCheck = new string[]{"is", "of", "the", "to"};   

     HashSet<string> Set = new HashSet<string>(new List<string>(str1.Split(' '))); 
     HashSet<string> Set2 = new HashSet<string>(new List<string>(str2.Split(' '))); 

     var Result = ((from s in Set 
         where Set2.Contains(s) && !DontCheck.Contains(s) 
         select s).Count() > 5); 

     int result =Convert.ToInt32(Result); 

     if (Result) 
     { 
      // It Contains more than 5 duplicate words 
     } 
+0

@Syeda,當我用str1的前15個單詞嘗試並與整個字符串的str2比較時,它每次返回超過5個 –

+0

Hello @Syeda,如果我設置了字符串str1 =「官方Facebook iPad App即將推出App Store官方Facebook iPad App即將推出App Store #use「;和str2 =「應用即將推出官方Facebook iPad Store應用官方」,那麼它返回的計數超過5。 –

+0

對不起@Abhishek我不明白你想說什麼。你給出的字符串值包含5個以上的副本。 – Syeda

相關問題