2010-08-31 46 views
1

我想要計算字符串中的單詞(不包括某些關鍵字)的頻率並對它們進行排序DESC。那麼,我該怎麼做呢?計算字符串中的字頻(最重要的字),排除關鍵字

在下面的字符串...

This is stackoverflow. I repeat stackoverflow. 

凡不包括關鍵字

ExKeywords() ={"i","is"} 

輸出應該像

stackoverflow 
repeat   
this   

附:沒有!我不重新設計谷歌! :)

回答

4
string input = "This is stackoverflow. I repeat stackoverflow."; 
string[] keywords = new[] {"i", "is"}; 
Regex regex = new Regex("\\w+"); 

foreach (var group in regex.Matches(input) 
    .OfType<Match>() 
    .Select(c => c.Value.ToLowerInvariant()) 
    .Where(c => !keywords.Contains(c)) 
    .GroupBy(c => c) 
    .OrderByDescending(c => c.Count()) 
    .ThenBy(c => c.Key)) 
{ 
    Console.WriteLine(group.Key); 
} 
+0

哇!非常感謝丹尼爾! – OrElse 2010-08-31 10:11:18

+0

+1,擊敗我。 – Ani 2010-08-31 10:13:57

+0

如果這是一個非常大的字符串(比如說12,000字),那麼Regex仍然是正確的方法呢? – discorax 2010-12-01 19:00:34

0
string s = "This is stackoverflow. I repeat stackoverflow."; 
string[] notRequired = {"i", "is"}; 

var myData = 
    from word in s.Split().Reverse() 
    where (notRequired.Contains(word.ToLower()) == false) 
    group word by word into g 
    select g.Key; 

foreach(string item in myData) 
    Console.WriteLine(item);