如何逐字逐字在C＃中的字符串？

15

foreach (string word in "incidentno and fintype or unitno".Split(' ')) { 
    ... 
}

來源

2009-09-18 07:04:41 Guffa

+0

我不知道這件事，但我認爲這會造成在每次迭代的分裂。你寧願將字符串拆分並放入本地數組，然後使用「in」運算符。 – synhershko 2009-09-18 07:27:07

+3

@synhershko：不，它只會分裂一次。 – Guffa 2009-09-18 07:34:54

+0

唯一的問題是標點符號'foreach（字符串字在「現在，結束就近了」.Split（''））' – 2009-09-18 08:30:08

3

假設的話總是用空格隔開，你可以使用String.Split()讓你的單詞的數組。

來源

2009-09-18 07:05:53 bbohac

3

使用String類

string[] words = "incidentno and fintype or unitno".Split(" ");

這的拆分方法拆分的空間，讓「字」將有[incidentno,and,fintype,or,unitno]。

來源

2009-09-18 07:06:49

12

var regex = new Regex(@"\b[\s,\.-:;]*"); 
var phrase = "incidentno and fintype or unitno"; 
var words = regex.Split(phrase).Where(x => !string.IsNullOrEmpty(x));

這工作，即使你有你的話之間「.,; tabs and new lines」。

來源

2009-09-18 07:09:30

+1

如果使用Split的重載，則不需要「.Where」可以添加StringSplitOptions.RemoveEmptyEntries。 – 2009-09-18 08:01:22

+1

沒有這樣的方法。我正在使用Regex.Split而不是String.Split – 2009-09-18 10:23:37

+0

在我看來，這是最好的答案，但是有一個錯誤。在標點字符中，您需要跳過連字符，否則將其定義爲範圍。所以第一行應該是'var regex = new Regex（@「\ b [\ s，\。\ - :;] *」）;' – Anduril 2017-04-27 09:26:39

11

稍微扭曲我知道，但你可以定義一個迭代器塊作爲字符串的擴展方法。例如

/// <summary> 
    /// Sweep over text 
    /// </summary> 
    /// <param name="Text"></param> 
    /// <returns></returns> 
    public static IEnumerable<string> WordList(this string Text) 
    { 
     int cIndex = 0; 
     int nIndex; 
     while ((nIndex = Text.IndexOf(' ', cIndex + 1)) != -1) 
     { 
      int sIndex = (cIndex == 0 ? 0 : cIndex + 1); 
      yield return Text.Substring(sIndex, nIndex - sIndex); 
      cIndex = nIndex; 
     } 
     yield return Text.Substring(cIndex + 1); 
    } 

     foreach (string word in "incidentno and fintype or unitno".WordList()) 
      System.Console.WriteLine("'" + word + "'");

其中的優點是不會爲長字符串創建大數組。

來源

2009-09-18 07:20:25 JDunkerley

+2

我喜歡這個選擇，對於大量數據非常有用，你真的該值得+1！ – jdehaan 2009-09-18 07:22:49

+0

是的，我也是+1！ – Wayne 2009-09-18 08:17:51

1

當使用拆分時，檢查空項是什麼？

string sentence = "incidentno and fintype or unitno" 
string[] words = sentence.Split(new char[] { ' ', ',' ,';','\t','\n', '\r'}, StringSplitOptions.RemoveEmptyEntries); 
foreach (string word in words) 
{ 
// Process 
}

編輯：

我不能評論，所以我在這裏發帖，但這（上面貼）工作原理：

foreach (string word in "incidentno and fintype or unitno".Split(' ')) 
{ 
    ... 
}

我的foreach的理解是，它首先進行的GetEnumerator （）和calles.MoveNext直到返回false。所以.Split不會在每次迭代中重新評估。

來源

2009-09-18 07:49:34 ParmesanCodice

2

有多種方法可以完成此操作。最方便的方法（在我看來）兩個是：

使用string.Split（）創建一個數組。我可能會使用這種方法，因爲它是最明顯的。

例如：

string startingSentence = "incidentno and fintype or unitno"; 
string[] seperatedWords = startingSentence.Split(' ');

或者，您可以使用（這是我會用什麼）：

string[] seperatedWords = startingSentence.Split(new char[] {' '}, StringSplitOptions.RemoveEmptyEntries);

StringSplitOptions.RemoveEmptyEntries將刪除你的數組任何空項由於可能出現的額外的空白和其他小問題。

下一頁 - 來處理的話，你可以使用：

foreach (string word in seperatedWords) 
{ 
//Do something 
}

或者，您可以使用正則表達式來解決這個問題，爲Darin demonstrated（副本如下）。

例如：

var regex = new Regex(@"\b[\s,\.-:;]*"); 
var phrase = "incidentno and fintype or unitno"; 
var words = regex.Split(phrase).Where(x => !string.IsNullOrEmpty(x));

進行處理，您可以使用類似的代碼的第一個選項。

foreach (string word in words) 
{ 
//Do something 
}

當然，也有許多辦法來解決這個問題，但我認爲這兩個是實現和維護最簡單的。我會選擇第一個選項（使用string.Split（）），因爲正則表達式有時會變得非常混亂，而分割將在大多數情況下正常運行。

來源

2009-09-18 08:16:59

-1

我寫了一個字符串處理器類，你可以使用它。

實施例：

metaKeywords = bodyText.Process(prepositions).OrderByDescending().TakeTop().GetWords().AsString();

類別：

public static class StringProcessor 
{ 
    private static List<String> PrepositionList; 

    public static string ToNormalString(this string strText) 
    { 
     if (String.IsNullOrEmpty(strText)) return String.Empty; 
     char chNormalKaf = (char)1603; 
     char chNormalYah = (char)1610; 
     char chNonNormalKaf = (char)1705; 
     char chNonNormalYah = (char)1740; 
     string result = strText.Replace(chNonNormalKaf, chNormalKaf); 
     result = result.Replace(chNonNormalYah, chNormalYah); 
     return result; 
    } 

    public static List<KeyValuePair<String, Int32>> Process(this String bodyText, 
     List<String> blackListWords = null, 
     int minimumWordLength = 3, 
     char splitor = ' ', 
     bool perWordIsLowerCase = true) 
    { 
     string[] btArray = bodyText.ToNormalString().Split(splitor); 
     long numberOfWords = btArray.LongLength; 
     Dictionary<String, Int32> wordsDic = new Dictionary<String, Int32>(1); 
     foreach (string word in btArray) 
     { 
      if (word != null) 
      { 
       string lowerWord = word; 
       if (perWordIsLowerCase) 
        lowerWord = word.ToLower(); 
       var normalWord = lowerWord.Replace(".", "").Replace("(", "").Replace(")", "") 
        .Replace("?", "").Replace("!", "").Replace(",", "") 
        .Replace("<br>", "").Replace(":", "").Replace(";", "") 
        .Replace("،", "").Replace("-", "").Replace("\n", "").Trim(); 
       if ((normalWord.Length > minimumWordLength && !normalWord.IsMemberOfBlackListWords(blackListWords))) 
       { 
        if (wordsDic.ContainsKey(normalWord)) 
        { 
         var cnt = wordsDic[normalWord]; 
         wordsDic[normalWord] = ++cnt; 
        } 
        else 
        { 
         wordsDic.Add(normalWord, 1); 
        } 
       } 
      } 
     } 
     List<KeyValuePair<String, Int32>> keywords = wordsDic.ToList(); 
     return keywords; 
    } 

    public static List<KeyValuePair<String, Int32>> OrderByDescending(this List<KeyValuePair<String, Int32>> list, bool isBasedOnFrequency = true) 
    { 
     List<KeyValuePair<String, Int32>> result = null; 
     if (isBasedOnFrequency) 
      result = list.OrderByDescending(q => q.Value).ToList(); 
     else 
      result = list.OrderByDescending(q => q.Key).ToList(); 
     return result; 
    } 

    public static List<KeyValuePair<String, Int32>> TakeTop(this List<KeyValuePair<String, Int32>> list, Int32 n = 10) 
    { 
     List<KeyValuePair<String, Int32>> result = list.Take(n).ToList(); 
     return result; 
    } 

    public static List<String> GetWords(this List<KeyValuePair<String, Int32>> list) 
    { 
     List<String> result = new List<String>(); 
     foreach (var item in list) 
     { 
      result.Add(item.Key); 
     } 
     return result; 
    } 

    public static List<Int32> GetFrequency(this List<KeyValuePair<String, Int32>> list) 
    { 
     List<Int32> result = new List<Int32>(); 
     foreach (var item in list) 
     { 
      result.Add(item.Value); 
     } 
     return result; 
    } 

    public static String AsString<T>(this List<T> list, string seprator = ", ") 
    { 
     String result = string.Empty; 
     foreach (var item in list) 
     { 
      result += string.Format("{0}{1}", item, seprator); 
     } 
     return result; 
    } 

    private static bool IsMemberOfBlackListWords(this String word, List<String> blackListWords) 
    { 
     bool result = false; 
     if (blackListWords == null) return false; 
     foreach (var w in blackListWords) 
     { 
      if (w.ToNormalString().Equals(word)) 
      { 
       result = true; 
       break; 
      } 
     } 
     return result; 
    } 
}

來源

2013-03-12 12:02:20 Jahan

0

public static string[] MyTest(string inword, string regstr) 
{ 
    var regex = new Regex(regstr); 
    var phrase = "incidentno and fintype or unitno"; 
    var words = regex.Split(phrase); 
    return words; 
}

？ MyTest的（「incidentno和.fintype-;或：unitno」，@ 「[^ \ w +]」）

[0]: "incidentno" 
[1]: "and" 
[2]: "fintype" 
[3]: "or" 
[4]: "unitno"

來源

2013-11-15 08:20:02

+0

嗯..這是一個答案？ – kleopatra 2013-11-15 08:40:49

0

我想一些信息添加到JDunkerley的awnser。
如果您提供字符串或字符參數進行搜索，則可以輕鬆使此方法更可靠。

public static IEnumerable<string> WordList(this string Text,string Word) 
     { 
      int cIndex = 0; 
      int nIndex; 
      while ((nIndex = Text.IndexOf(Word, cIndex + 1)) != -1) 
      { 
       int sIndex = (cIndex == 0 ? 0 : cIndex + 1); 
       yield return Text.Substring(sIndex, nIndex - sIndex); 
       cIndex = nIndex; 
      } 
      yield return Text.Substring(cIndex + 1); 
     } 

public static IEnumerable<string> WordList(this string Text, char c) 
     { 
      int cIndex = 0; 
      int nIndex; 
      while ((nIndex = Text.IndexOf(c, cIndex + 1)) != -1) 
      { 
       int sIndex = (cIndex == 0 ? 0 : cIndex + 1); 
       yield return Text.Substring(sIndex, nIndex - sIndex); 
       cIndex = nIndex; 
      } 
      yield return Text.Substring(cIndex + 1); 
     }

來源

2013-12-10 13:04:14

如何逐字逐字在C＃中的字符串？

回答

相關問題