我正在從文件讀取到緩衝區,然後我將讀取的文本分成字符串,其中每個文本以新行結尾形成一個新字符串。從文件中逐塊讀取,然後逐行分割測試
這裏是我的代碼:
int ysize = 20000;
char buffer2[ysize];
int flag = 0;
string temp_str;
vector<string> temp;
while(fread(buffer2, ysize, 1, fp2)>0){
//printf("%s", buffer2);
std::string str(buffer2);
//push the data into the vect
std::string::size_type pos = 0;
std::string::size_type prev = 0;
/*means the last read did not read a full sentence*/
if (flag == 1) {
if (buffer[0] == '\n') {
//this means we have read the last senstense correctly, directly go to the next
}
else{
if((pos = str.find("\n", prev)) != std::string::npos){
temp_str+=str.substr(prev, pos - prev);
temp.push_back(temp_str);
prev = pos + 1;
}
while ((pos = str.find("\n", prev)) != std::string::npos)
{
temp.push_back(str.substr(prev, pos - prev));
prev = pos + 1;
}
// To get the last substring (or only, if delimiter is not found)
temp.push_back(str.substr(prev));
if (buffer2[19999] != '\n') {
//we did not finish readind that query
flag = 1;
temp_str = temp.back();
temp.pop_back();
}
else{
flag = 0;
}
}
}
else{
while ((pos = str.find("\n", prev)) != std::string::npos)
{
temp.push_back(str.substr(prev, pos - prev));
prev = pos + 1;
}
// To get the last substring (or only, if delimiter is not found)
temp.push_back(str.substr(prev));
if (buffer2[19999] != '\n') {
//we did not finish readind that query
flag = 1;
temp_str = temp.back();
temp.pop_back();
}
else{
flag = 0;
}}
}
問題是這樣的不正確讀取數據時,它幾乎消除了文字的一半。
我不知道我在這裏錯過了什麼。我的想法是逐塊讀取數據塊,然後逐行分割,這是while循環中的內容。我正在處理使用該標誌的溢出案例。
['while(std :: getline(myFileStream,lineStr)){...}'](http://en.cppreference.com/w/cpp/string/basic_string/getline),並相信你的' std :: ifstream'實現來做合理的緩衝。 – BoBTFish
我做到了,但表現糟透了。我試圖讀取數據塊來提高性能,當我測試時是一個顯着的差異,但分割字符串有點困難 – user7631183
我同意BoBTFish,但也許你可以嘗試'std :: regex'或'std :: stringstream'。 –