從文件中逐塊讀取，然後逐行分割測試

我正在從文件讀取到緩衝區，然後我將讀取的文本分成字符串，其中每個文本以新行結尾形成一個新字符串。從文件中逐塊讀取，然後逐行分割測試

這裏是我的代碼：

int ysize = 20000; 
char buffer2[ysize]; 
int flag = 0; 
string temp_str; 
vector<string> temp; 
while(fread(buffer2, ysize, 1, fp2)>0){ 
    //printf("%s", buffer2); 
    std::string str(buffer2); 
    //push the data into the vect 
    std::string::size_type pos = 0; 
    std::string::size_type prev = 0; 
    /*means the last read did not read a full sentence*/ 
    if (flag == 1) { 
     if (buffer[0] == '\n') { 
      //this means we have read the last senstense correctly, directly go to the next 
     } 
     else{ 
      if((pos = str.find("\n", prev)) != std::string::npos){ 
       temp_str+=str.substr(prev, pos - prev); 
       temp.push_back(temp_str); 
       prev = pos + 1; 
      } 
      while ((pos = str.find("\n", prev)) != std::string::npos) 
      { 
       temp.push_back(str.substr(prev, pos - prev)); 
       prev = pos + 1; 
      } 

      // To get the last substring (or only, if delimiter is not found) 
      temp.push_back(str.substr(prev)); 

      if (buffer2[19999] != '\n') { 
       //we did not finish readind that query 
       flag = 1; 
       temp_str = temp.back(); 
       temp.pop_back(); 
      } 
      else{ 
       flag = 0; 
      } 


     } 
    } 
    else{ 

     while ((pos = str.find("\n", prev)) != std::string::npos) 
     { 
      temp.push_back(str.substr(prev, pos - prev)); 
      prev = pos + 1; 
     } 

     // To get the last substring (or only, if delimiter is not found) 
     temp.push_back(str.substr(prev)); 

     if (buffer2[19999] != '\n') { 
      //we did not finish readind that query 
      flag = 1; 
      temp_str = temp.back(); 
      temp.pop_back(); 
     } 
     else{ 
      flag = 0; 
     }} 
}

問題是這樣的不正確讀取數據時，它幾乎消除了文字的一半。

我不知道我在這裏錯過了什麼。我的想法是逐塊讀取數據塊，然後逐行分割，這是while循環中的內容。我正在處理使用該標誌的溢出案例。

來源

2017-03-20 user7631183

['while（std :: getline（myFileStream，lineStr））{...}']（http://en.cppreference.com/w/cpp/string/basic_string/getline），並相信你的' std :: ifstream'實現來做合理的緩衝。 – BoBTFish

我做到了，但表現糟透了。我試圖讀取數據塊來提高性能，當我測試時是一個顯着的差異，但分割字符串有點困難 – user7631183

我同意BoBTFish，但也許你可以嘗試'std :: regex'或'std :: stringstream'。 –

首先說明，這FREAD不會奇蹟般地創造一個空終止字符串，這意味着的std :: string STR（緩衝器2）會導致不確定的行爲。所以，你應該做這樣

int nread = 0; 
while((nread =fread(buffer2, ysize-1, 1, fp2)) > 0){ 
    buffer2[nread] = 0; 
    std::string str(buffer2); 
    ...

東西時要避免實行這裏的緩衝方法，你可以使用fgets來逐行讀取，那麼你就只擔心串聯是比讀緩衝區線較長。

除了[我已發現了一個問題：如果在緩衝區中的第一個字符是換行和標誌== 1你跳過當前整個緩衝區讀取下一個，如果仍有可用數據。（我假設用buffer [0]你實際上是指buffer2 [0]）。

來源

2017-03-20 14:33:37

謝謝！，我不是''buffer2 [nread] = 0; '，這將永遠刪除我的上一個閱讀字符，並將其替換爲0，不是嗎？＆fgets不會解決我的問題，我試圖一次讀取多行代碼 – user7631183

不，因爲在C/C++中，他的索引是基於0的，因此當* nread *字符讀入緩衝區時，它們將處於* buffer [0] ... buffer [nread-1] *，並且* buffer [nread] = 0 *確保空終止。至於* fgets * - 是的，我知道你想多讀一行，但可能* fgets *可以爲你節省一些麻煩，以便稍後拆分緩衝區，* fgets *也可以一些緩衝，所以很可能你不會失去使用它的性能。 –

從文件中逐塊讀取，然後逐行分割測試

回答

相關問題