記號化字符串，包括在C++

分隔符，我用下面tokening，但不知道該如何與包括它的分隔符。記號化字符串，包括在C++

void Tokenize(const string str, vector<string>& tokens, const string& delimiters) 
{ 

    int startpos = 0; 
    int pos = str.find_first_of(delimiters, startpos); 
    string strTemp; 


    while (string::npos != pos || string::npos != startpos) 
    { 

     strTemp = str.substr(startpos, pos - startpos); 
     tokens.push_back(strTemp.substr(0, strTemp.length())); 

     startpos = str.find_first_not_of(delimiters, pos); 
     pos = str.find_first_of(delimiters, startpos); 

    } 
}

來源

2009-10-02 Jeremiah

的C++ String Toolkit Library (StrTk)具有以下溶液：

std::string str = "abc,123 xyz"; 
std::vector<std::string> token_list; 
strtk::split(";., ", 
      str, 
      strtk::range_to_type_back_inserter(token_list), 
      strtk::include_delimiters);

應該導致與token_list包括以下元素：

 
Token₀ = "abc," 
Token₁ = "123 " 
Token₂ = "xyz"

更多實例可以發現Here

來源

2009-10-17 21:59:15

我不能真的關注你的代碼，你能發佈一個工作程序嗎？

無論如何，這是一個簡單的標記生成器，而無需測試邊緣情況：

#include <iostream> 
#include <string> 
#include <vector> 

using namespace std; 

void tokenize(vector<string>& tokens, const string& text, const string& del) 
{ 
    string::size_type startpos = 0, 
     currentpos = text.find(del, startpos); 

    do 
    { 
     tokens.push_back(text.substr(startpos, currentpos-startpos+del.size())); 

     startpos = currentpos + del.size(); 
     currentpos = text.find(del, startpos); 
    } while(currentpos != string::npos); 

    tokens.push_back(text.substr(startpos, currentpos-startpos+del.size())); 
}

實施例的輸入，定界符= $$：

Hello$$Stack$$Over$$$Flow$$$$!

令牌：

Hello$$ 
Stack$$ 
Over$$ 
$Flow$$ 
$$ 
!

注：我將永遠不會使用我未經測試寫出的分詞器！請使用boost::tokenizer！

來源

2009-10-02 18:38:19 AraK

1爲Boost.Tokenizer提及 –

我編輯了m y發佈包含所有的功能。我看到你做了什麼，但分隔符將是一個字符串，字符串中的每個字符將是一個分隔符。通過像這樣「！\ n」個因此，一個逗號，句號，感嘆號和新的生產線將被推入載體爲好，但是不佔空間。通過這種方式，我可以將矢量加入並在矢量項之間使用空格並重新構建字符串。 – Jeremiah

逗號，句號，感嘆號和包括空格在內的新行將成爲分隔符。對不起，想清楚。 – Jeremiah

這取決於您希望使用前面的分隔符，下面的分隔符還是兩者，以及您想要在字符串的開始和結尾處使用哪些字符串，而在字符串的前後可能沒有分隔符。

我會假設你想每一個字，其前面和後面的分隔符，而不是分隔的任何字符串本身（例如，如果有以下的最後一個字符串分隔符）。

template <class iter> 
void tokenize(std::string const &str, std::string const &delims, iter out) { 
    int pos = 0; 
    do { 
     int beg_word = str.find_first_not_of(delims, pos); 
     if (beg_word == std::string::npos) 
      break; 
     int end_word = str.find_first_of(delims, beg_word); 
     int beg_next_word = str.find_first_not_of(delims, end_word); 
     *out++ = std::string(str, pos, beg_next_word-pos); 
     pos = end_word; 
    } while (pos != std::string::npos); 
}

就目前而言，我寫它更像是一個STL算法，以用於其輸出迭代器，而不是假設它總是推到一個集合。由於它在輸入中依賴於（現在）字符串，因此它不會爲輸入使用迭代器。

來源

2009-10-02 19:04:06

我想要字符串「Test string，on the web。\ nTest line one」。成爲像這樣的令牌。我想要一個空間，一個社區，一個時期，\ n作爲分隔符。測試串，上的網絡。 \ n 測試行一個。 – Jeremiah

對不起，它沒有正確發佈。在分隔符之後，它應該讓每一件事都在新的一行上。 – Jeremiah

如果分隔符是字符，不是字符串，那麼你可以使用strtok。

來源

2009-10-02 20:17:16

呵呵？ strtok有什麼問題？ –

謝謝..我幾乎已經忘記了這個功能：P – poorva

'strtok'消耗分隔符，我相信。 – Santa

我現在這一點不馬虎，但是這是我結束了。我不想使用boost，因爲這是一個學校任務，我的老師希望我使用find_first_of來完成這個任務。

感謝大家的幫助。

vector<string> Tokenize(const string& strInput, const string& strDelims) 
{ 
vector<string> vS; 

string strOne = strInput; 
string delimiters = strDelims; 

int startpos = 0; 
int pos = strOne.find_first_of(delimiters, startpos); 

while (string::npos != pos || string::npos != startpos) 
{ 
    if(strOne.substr(startpos, pos - startpos) != "") 
    vS.push_back(strOne.substr(startpos, pos - startpos)); 

    // if delimiter is a new line (\n) then addt new line 
    if(strOne.substr(pos, 1) == "\n") 
    vS.push_back("\\n"); 
    // else if the delimiter is not a space 
    else if (strOne.substr(pos, 1) != " ") 
    vS.push_back(strOne.substr(pos, 1)); 

    if(string::npos == strOne.find_first_not_of(delimiters, pos)) 
    startpos = strOne.find_first_not_of(delimiters, pos); 
    else 
    startpos = pos + 1; 

     pos = strOne.find_first_of(delimiters, startpos); 

} 

return vS; 
}

來源

2009-10-03 15:50:42 Jeremiah

記號化字符串，包括在C++

回答

相關問題