2014-01-12 51 views
1

UPDATE查找短語在字符串

對不起多次使用,我有一點英語。

我想對字符串中的短語進行計數。

我的字符串在下面;

Lorem存有悲阿梅德,consectetur adipiscing ELIT。法無 venenatis,Lorem存有 augue德維爾pellentesque 坐阿梅德Lorem存有悲拉克絲egestas, 等存有悲法無。

我想在下面;

  • 3倍Lorem存有

  • 2X 坐阿梅德

我在這個環節 Find which phrases have been used multiple times in a string.

試圖功能

但我的結果如下;

  • 重複= 10080x(它包括空格?)
  • 重複= 99X的Photoshop
  • 重複= 52X dersleri
  • 重複= 44X的Photoshop dersleri
  • 重複= 36倍的Photoshop ILE

但我想低於;

  • 重複= 44X的Photoshop dersleri
  • 重複= 36倍的Photoshop ILE
  • 重複=和其他...

我用這個功能;

var splitBySpace = text2.Split(' '); 

var doubleWords = splitBySpace 
     .Select((x, i) => new { Value = x, Index = i }) 
     .Where(x => x.Index != splitBySpace.Length - 1) 
     .Select(x => x.Value + " " + splitBySpace.ElementAt(x.Index + 1)); 

var duplicates = doubleWords 
    .GroupBy(x => x) 
    .Where(x => x.Count() > 1) 
    .Select(x => new { x.Key, Count = x.Count() }) 
    .OrderByDescending(w => w.Count); 

foreach (var word in duplicates) 
    ensikkelimeler.Add(string.Format("{0}x {1}", word.Count, word.Key)); 
+0

它重複「的Photoshop 「雖然沒有在原來的字符串中存在? –

+0

Photoshop字在我的字符串中使用了99次。但我不想要一個字。我需要兩個字。不是「photoshop」,它會是「photoshop dersleri」。 – user3186216

回答

0

我調整你的代碼(這似乎是從this answer拍攝)位(我描述了評論的變化):

// all separators from sample text, add additional if necessary 
var splitBySpace = text2.Split(new[] {' ', '.', ','}, StringSplitOptions.RemoveEmptyEntries); 

var doubleWords = splitBySpace 
    // make the search case insensitive 
    .Select((x, i) => new {Value = x.ToLowerInvariant(), Index = i}) 
    .Where(x => x.Index != splitBySpace.Length - 1) 
    .Select(x => x.Value + " " + splitBySpace.ElementAt(x.Index + 1)); 

var ensikkelimeler = doubleWords 
    .GroupBy(x => x) 
    .Where(x => x.Count() > 1) 
    .Select(x => new {x.Key, Count = x.Count()}) 
    .OrderByDescending(w => w.Count) 
    // do the formatting inside the link expression 
    .Select(word => string.Format("{0}x {1}", word.Count, word.Key)) 
    .ToList(); 

這些是您的示例文本的結果:

3x lorem ipsum 
3x ipsum dolor 
2x sit amet 

我也是從您鏈接到的問題嘗試了accepted answer。我加入ToLowerInvariant()的電話,之後就返回了兩個詞短語相同的結果,同時也包括了三個詞短語:

2x lorem ipsum dolor 
3x lorem ipsum 
3x ipsum dolor 
2x sit amet 
0
var text = @"Lorem ipsum dolor sit amet, consectetur adipiscing elit. 
      Nulla venenatis, lorem ipsum augue vel pellentesque sit amet, 
      lorem ipsum dolor egestas lacus, et ipsum dolor nulla."; 

var phrases = new string[] { "sit amet", "lorem ipsum" }; 

var q = phrases.Select(p => new { phrase = p, Count = CountPhraseInText(text, p) }) 
       .OrderBy(x => x.Count); 

CountPhraseInText功能:

int CountPhraseInText(string input, string phrase) 
{ 
    return new Regex(phrase, RegexOptions.IgnoreCase).Matches(input).Count; 
}