由多個分隔符

我有一些文字（有意義的文本或算術表達式）拆分將一個字符串的話，我想將它拆分成詞。
如果我有一個分隔符，我會使用：由多個分隔符

std::stringstream stringStream(inputString); 
std::string word; 
while(std::getline(stringStream, word, delimiter)) 
{ 
    wordVector.push_back(word); 
}

我怎樣才能打破串入令牌與幾個分隔符？

來源

2011-10-01 Ypsilon IV

Boost.StringAlgorithm或Boost.Tokenizer會有所幫助。 –

或者一些想法，你可以從這個答案得到：http://stackoverflow.com/questions/4888879/elegant-ways-to-count-the-frequency-of-words-in-a-file – Nawaz

@ K-BALLO ：根據這個問題，你不應該使用像Boost這樣的外部庫。 – deepmax

假設分隔符中的一個是換行，下面讀取的行和由分隔符進一步拆分它。在這個例子中，我選擇了分隔符空格，撇號和分號。

std::stringstream stringStream(inputString); 
std::string line; 
while(std::getline(stringStream, line)) 
{ 
    std::size_t prev = 0, pos; 
    while ((pos = line.find_first_of(" ';", prev)) != std::string::npos) 
    { 
     if (pos > prev) 
      wordVector.push_back(line.substr(prev, pos-prev)); 
     prev = pos+1; 
    } 
    if (prev < line.length()) 
     wordVector.push_back(line.substr(prev, std::string::npos)); 
}

來源

2011-10-01 17:30:43 SoapBox

對於我來說太快了：p如果換行符不是分隔符，那麼只需選擇其中一個「常規」分隔符（並將其從內部循環中移除）即可使用。 –

如果你有提升，你可以使用：

#include <boost/algorithm/string.hpp> 
std::string inputString("One!Two,Three:Four"); 
std::string delimiters("|,:"); 
std::vector<std::string> parts; 
boost::split(parts, inputString, boost::is_any_of(delimiters));

來源

2013-06-03 04:02:46 MattSmith

如果您在如何做自己感興趣和不使用升壓。

假設分隔符字符串可能非常長 - 比如說M，檢查字符串中的每個字符（如果它是分隔符），將花費O（M）每個字符，因此在循環中爲原始中的所有字符字符串，假設長度爲N，是O（M * N）。

我會使用一個字典（如地圖 - 「分隔符」爲「布爾值」 - 但在這裏我會使用一個簡單的布爾數組，在每個分隔符的index = ascii值中爲true）。

現在迭代在串並檢查一個字符是分隔符是O（1），最終給了我們O（N）的整體。

這裏是我的示例代碼：

const int dictSize = 256;  

vector<string> tokenizeMyString(const string &s, const string &del) 
{ 
    static bool dict[dictSize] = { false}; 

    vector<string> res; 
    for (int i = 0; i < del.size(); ++i) {  
     dict[del[i]] = true; 
    } 

    string token(""); 
    for (auto &i : s) { 
     if (dict[i]) { 
      if (!token.empty()) { 
       res.push_back(token); 
       token.clear(); 
      }   
     } 
     else { 
      token += i; 
     } 
    } 
    if (!token.empty()) { 
     res.push_back(token); 
    } 
    return res; 
} 


int main() 
{ 
    string delString = "MyDog:Odie, MyCat:Garfield MyNumber:1001001"; 
//the delimiters are " " (space) and "," (comma) 
    vector<string> res = tokenizeMyString(delString, " ,"); 

    for (auto &i : res) { 

     cout << "token: " << i << endl; 
    } 
return 0; 
}

注：tokenizeMyString按值返回向量和第一棧上創建的，所以我們在這裏使用的編譯器>>> RVO的力量 - 返回值優化:)

來源

2016-12-22 14:02:41 Kohn1001

我不知道爲什麼沒有人指出，手工的方式，但在這裏它是：

const std::string delims(";,:. \n\t"); 
inline bool isDelim(char c) { 
    for (int i = 0; i < delims.size(); ++i) 
     if (delims[i] == c) 
      return true; 
    return false; 
}

和功能：

std::stringstream stringStream(inputString); 
std::string word; char c; 

while (stringStream) { 
    word.clear(); 

    // Read word 
    while (!isDelim((c = stringStream.get()))) 
     word.push_back(c); 
    if (c != EOF) 
     stringStream.unget(); 

    wordVector.push_back(word); 

    // Read delims 
    while (isDelim((c = stringStream.get()))); 
    if (c != EOF) 
     stringStream.unget(); 
}

這樣你就可以做一些事情，如果你想在delims有用。

來源

2017-04-04 11:27:33 forumulator

你可以移動std :: string字;和char c;在循環內部避免使用clear（）...變量應儘可能地保持本地化和短暫性。 – Mohan

由多個分隔符

回答

相關問題