2011-02-05 208 views
1

我想搜索字符串以查找用戶輸入的特定單詞,然後輸出單詞在文本中顯示的百分比。只是想知道最好的方法是什麼,如果你能幫助我。搜索特定字的字符串。 C#

+0

確切地說,你是指百分比? – 2011-02-05 11:55:34

+1

我假設他意味着有多少(number_of_times_word_to_find_occurs/total_number_of_words)* 100。 – david 2011-02-05 12:13:05

回答

0

我的建議是一個完整的課程。

class WordCount { 
    const string Symbols = ",;.:-()\t!¡¿?\"[]{}&<>+-*/=#'"; 

    public static string normalize(string str) 
    { 
     var toret = new StringBuilder(); 

     for(int i = 0; i < str.Length; ++i) { 
      if (Symbols.IndexOf(str[ i ]) > -1) { 
       toret.Append(' '); 
      } else { 
       toret.Append(char.ToLower(str[ i ])); 
      } 
     } 

     return toret.ToString(); 
    } 

    private string word; 
    public string Word { 
     get { return this.word; } 
     set { this.word = value; } 
    } 

    private string str; 
    public string Str { 
     get { return this.str; } 
    } 

    private string[] words = null; 
    public string[] Words { 
     if (this.words == null) { 
      this.words = this.Str.split(' '); 
     } 

     return this.words; 
    } 

    public WordCount(string str, string w) 
    { 
     this.str = ' ' + normalize(str) + ' '; 
     this.word = w; 
    } 

    public int Times() 
    { 
     return this.Times(this.Word); 
    } 

    public int Times(string word) 
    { 
     int times = 0; 

     word = ' ' + word + ' '; 

     int wordLength = word.Length; 
     int pos = this.Str.IndexOf(word); 

     while(pos > -1) { 
      ++times; 

      pos = this.Str.IndexOf(pos + wordLength, word); 
     } 

     return times; 
    } 

    public double Percentage() 
    { 
     return this.Percentage(this.Word); 
    } 

    public double Percentage(string word) 
    { 
     return (this.Times(word)/this.Words.Length); 
    } 
} 

優點:字符串分割緩存,所以沒有將其應用於超過一次的危險。它包裝在一個班級,所以它可以很容易地重新獲得。沒有Linq的必要性。 希望這有助於。

2

最簡單的方法是使用LINQ:

char[] separators = new char() {' ', ',', '.', '?', '!', ':', ';'}; 
var count = 
    (from word In sentence.Split(separators)  // get all the words 
    where word.ToLower() = searchedWord.ToLower() // find the words that match 
    select word).Count();       // count them 

這隻能算作這個詞出現在文本的次數。你也可以算多少的話有於文:

var totalWords = sentence.Split(separators).Count()); 

,然後就得到百分比:

var result = count/totalWords * 100; 
+3

有這麼多的角落案例,這將錯過。如果你在「一,二,三」這個句子中搜索「two」,你就不會得到任何匹配,因爲split會給出元素「two」(包括逗號)。這意味着您需要考慮各種分隔符,並在分割之前將其除去(除非用戶正在搜索它們)。 – 2011-02-05 12:03:14

3

我建議使用String.Equals超載與StringComparison獲得更好的性能規定。

var separators = new [] { ' ', ',', '.', '?', '!', ';', ':', '\"' }; 
var words = sentence.Split (separators); 
var matches = words.Count (w => 
    w.Equals (searchedWord, StringComparison.OrdinalIgnoreCase)); 
var percentage = matches/(float) words.Count; 

注意percentagefloat,例如0.5爲50%。

var formatted = percentage.ToString ("P0"); // 0.1234 => 12 % 

您還可以更改格式說明顯示小數位:

var formatted = percentage.ToString ("P2"); // 0.1234 => 12.34 % 

請記住,這種方法是無效的長字符串,因爲
可以使用ToString超載格式化顯示它會爲每個找到的單詞創建一個字符串實例。您可能需要採取StringReader並手動逐字閱讀。

0
// The words you want to search for 
var words = new string[] { "this", "is" }; 

// Build a regular expresion query 
var wordRegexQuery = new System.Text.StringBuilder(); 
wordRegexQuery.Append("\\b("); 
for (var wordIndex = 0; wordIndex < words.Length; wordIndex++) 
{ 
    wordRegexQuery.Append(words[wordIndex]); 
    if (wordIndex < words.Length - 1) 
    { 
    wordRegexQuery.Append('|'); 
    } 
} 
wordRegexQuery.Append(")\\b"); 

// Find matches and return them as a string[] 
var regex = new System.Text.RegularExpressions.Regex(wordRegexQuery.ToString(), RegexOptions.IgnoreCase); 
var someText = var someText = "This is some text which is quite a good test of which word is used most often. Thisis isthis athisisa."; 
var matches = (from Match m in regex.Matches(someText) select m.Value).ToArray(); 

// Display results 
foreach (var word in words) 
{ 
    var wordCount = (int)matches.Count(w => w.Equals(word, StringComparison.InvariantCultureIgnoreCase)); 
    Console.WriteLine("{0}: {1} ({2:f2}%)", word, wordCount, wordCount * 100f/matches.Length); 
}