我正在讀取文本文件並將其解析爲映射,以計算每行上每個單詞的出現次數。我需要忽略除撇號以外的所有非字母字符(標點,數字,空格等)。我可以找出如何使用下面的代碼刪除所有這些字符,但這會導致不正確的單詞,如「one-two」作爲「onetwo」出現,應該是兩個單詞「one」和「two」。C++用字符串中的空格替換非alpha /撇號
相反,我想現在用空格替換所有這些值而不是簡單刪除,但無法弄清楚如何做到這一點。我認爲replace-if算法是一個很好的算法,但是無法弄清楚實現這一點的正確語法。 C++ 11很好。有什麼建議麼?
樣本輸出將是如下:
"first second" = "first" and "second"
"one-two" = "one" and "two"
"last.First" = "last" and "first"
"you're" = "you're"
"great! A" = "great" and "A"
// What I initially used to delete non-alpha and white space (apostrophe's not working currently, though)
// Read file one line at a time
while (getline(text, line)){
istringstream iss(line);
// Parse line on white space, storing values into tokens map
while (iss >> word){
word.erase(remove_if(word.begin(), word.end(), my_predicate), word.end());
++tokens[word][linenum];
}
++linenum;
}
bool my_predicate(char c){
return c == '\'' || !isalpha(c); // This line's not working properly for apostrophe's yet
}