如何在C++中創建通用數據標記器？

所以，說我有一個類似格式的這個常規數據的數據文件：如何在C++中創建通用數據標記器？

[42,6,9,56,1337] 
[220,9001,15,22,35] 
[127,0,0,1,8080]

我作爲一個字符串讀取每個行，我有一個接受輸入字符串，多張分隔符爲另一個字符串一個標記，並提及用於存儲輸出的vector<string>。

// given a string with delimiters inside, parse it into 
// individual tokens stored in a vector<string> 
void tokenize(const string& str, vector<string>& tokens, 
       const string& delimiters = " ") { 
    auto last_pos = str.find_first_not_of(delimiters, 0);  // first token 
    auto curr_pos = str.find_first_of(delimiters, last_pos); // next delim 

    while (curr_pos != str_end || last_pos != str_end) { 
    tokens.emplace_back(str.substr(last_pos, curr_pos - last_pos));  
    last_pos = str.find_first_not_of(delimiters, curr_pos); // next token 
    curr_pos = str.find_first_of(delimiters, last_pos);  // next delim 
    } 
} 

int main() { 
    ifstream fs{"data"}; 
    string tmp{""}; 
    const string delims{"[,]"}; 
    vector<string> tokens; 
    //vector<int> tokens; 
    //vector<double> tokens; 

    while (getline(fs, tmp)) tokenize(tmp, tokens, delims); 

    cout << tokens << endl; 
}

到目前爲止確定。但後來我想用實際的數據類型來代替字符串，所以我編寫了幾個數字包裝函數，它們將vector<string>轉換成（比如說）vector<int>。然後我意識到這些基本上是彼此重複的。

// int wrapper 
void tokenize(const string& str, vector<int>& tokens, 
       const string& delimiters = " ") { 
    vector<string> str_tokens; 
    tokenize(str, str_tokens, delims); 

    for (const auto& e : str_tokens) 
    tokens.emplace_back(stoi(e)); // ints  
}

然後我試圖創建另一個一般的包裝，但還是掛了問題，一個）我不知道如何標準庫轉換功能之間切換，並且乙）提花它也會嘗試用T字符串執行，這不是最初的想法。

經過一點點進一步的思考，我意識到我可能只是做錯了，應該以某種方式嘗試只使用一個泛型函數。但我不知道如何去做。

下面是程序清單。數據存儲爲名爲「data」的本地文件。 http://pastebin.com/dRAXRWa3

來源

2015-10-12 alarmed

是否有任何理由編寫自己的tokenizer？爲什麼不使用一個庫，比如boost :: spirit？ – Rostislav

您可以嘗試閱讀awk或CSV解析器的源代碼，因爲這些是用C或C++編寫的通用數據標記器的示例。 C與C++不同，但我確信代碼會有幫助。 – djechlin

@Rostislav因爲我想了解創建自己的C++程序的細節。 – alarmed

這是模板發揮作用的典型示例。唯一的罪魁禍首是你需要調用不同的函數來將字符串轉換爲數據類型。儘管如此，這也可以通過模板來解決。這裏有一個工作註釋過的例子：

#include <iostream> 
#include <iomanip> 

#include <vector> 
#include <algorithm> 
#include <string> 

using namespace std; 

// Declare a generic conversion function... 
template<typename T> 
T stoT(const std::string& s); 

// ... and specialize it for the data types you need to convert 
// int specialization 
template<> 
int stoT(const std::string& s) 
{ 
    return stoi(s); 
} 

// double specialization 
template<> 
double stoT(const std::string& s) 
{ 
    return stod(s); 
} 

template<typename T> 
void tokenize(const string& str, vector<T>& tokens, 
       const string& delimiters = " ") { 
    vector<string> str_tokens = {"1", "2", "3"}; 

    // Prepare the output - clear and reserve the space to avoid multiple allocations 
    str_tokens.clear(); 
    tokens.reserve(str_tokens.size()); 

    // Transform the strings to your data types 
    std::transform(str_tokens.begin(), str_tokens.end(), std::back_inserter(tokens), stoT<T>); 

} 

int main() 
{ 
    std::vector<int> vi; 
    tokenize("", vi); 
    for (const auto& v : vi) { std::cout << v << " "; } 

    std::cout << "\n"; 

    std::vector<double> vd; 
    tokenize("", vd); 
    std::cout << std::fixed; 
    for (const auto& v : vd) { std::cout << std::setprecision(2) << v << " "; } 
}

來源

2015-10-12 15:57:39 Rostislav

非常感謝你，這太棒了！我會用你的例子來改進我自己提出的有點類似的解決方案。 – alarmed

所以一哥們向我介紹這個頁面在isocpp：template specialization，我能夠拿出我自己的一個可行的方法（雖然羅斯季斯拉夫的顯然是更好）。

我創建了一個

T decode<T>(const string& x) { }

組特化的，那麼模板化標記生成器（）函數和改變的一行代碼。

tokens.emplace_back(decode<T>(str.substr(last_pos, curr_pos - last_pos)));

它似乎工作得非常符合我的意圖。現在我會在你的建議下改進它。

謝謝。

（編輯）這是一個更正的版本。 http://pastebin.com/reRMc2G3

來源

2015-10-12 18:14:47 alarmed

如何在C++中創建通用數據標記器？

回答

相關問題