快速，簡單++ CSV解析用C

我試圖解析一個簡單的CSV文件中，如在格式數據：快速，簡單++ CSV解析用C

20.5,20.5,20.5,0.794145,4.05286,0.792519,1 
20.5,30.5,20.5,0.753669,3.91888,0.749897,1 
20.5,40.5,20.5,0.701055,3.80348,0.695326,1

所以，一個很簡單的固定格式文件。我將這些數據的每一列存儲到一個STL向量中。因此，我試圖保持使用標準庫中的C++的方式，一個循環內我的實現看起來是這樣的：

string field; 
getline(file,line); 
stringstream ssline(line); 

getline(ssline, field, ','); 
stringstream fs1(field); 
fs1 >> cent_x.at(n); 

getline(ssline, field, ','); 
stringstream fs2(field); 
fs2 >> cent_y.at(n); 

getline(ssline, field, ','); 
stringstream fs3(field); 
fs3 >> cent_z.at(n); 

getline(ssline, field, ','); 
stringstream fs4(field); 
fs4 >> u.at(n); 

getline(ssline, field, ','); 
stringstream fs5(field); 
fs5 >> v.at(n); 

getline(ssline, field, ','); 
stringstream fs6(field); 
fs6 >> w.at(n);

的問題是，這是極其緩慢（有每個數據超過1萬行文件），在我看來似乎有點不雅。使用標準庫有更快的方法，還是應該使用stdio函數？在我看來，整個代碼塊會減少到單個fscanf調用。

在此先感謝！

來源

2012-05-30 Kyle Lynch

以下問題的重複：http://stackoverflow.com/questions/1120140/csv-parser-in-c –

ÇCSV解析器：http://sourceforge.net/projects/cccsvparser C CSV編寫器：http://sourceforge.net/projects/cccsvwriter – SomethingSomething

使用7個字符串流，當你可以做到只有一個確定沒有幫助wrt。性能。試試這個：

string line; 
getline(file, line); 

istringstream ss(line); // note we use istringstream, we don't need the o part of stringstream 

char c1, c2, c3, c4, c5; // to eat the commas 

ss >> cent_x.at(n) >> c1 >> 
     cent_y.at(n) >> c2 >> 
     cent_z.at(n) >> c3 >> 
     u.at(n) >> c4 >> 
     v.at(n) >> c5 >> 
     w.at(n);

如果知道該文件中的行數，就可以讀取之前調整的載體，然後用operator[]代替at()。這樣你避免了邊界檢查，從而獲得一些性能。

來源

2012-05-30 10:30:46 jrok

完美！它工作得很好，好多了。感謝關於吃逗號的字符的提示！ –

@KyleLynch：我會嚴肅地建議你檢查'char'是否被初始化爲逗號。此外，您應該檢查該流是否有效，或者設置異常標誌，以便在輸出錯誤時發出警告。 –

微小的事情：一個字符吃逗號就足夠了 – IceFire

我相信主要瓶頸（拋開基於getline（）的非緩衝I/O）是字符串解析。由於您使用「，」符號作爲分隔符，因此您可以對字符串執行線性掃描，並用「\ 0」（字符串末尾標記，零終止符）替換所有「，」。

事情是這樣的：

// tmp array for the line part values 
double parts[MAX_PARTS]; 

while(getline(file, line)) 
{ 
    size_t len = line.length(); 
    size_t j; 

    if(line.empty()) { continue; } 

    const char* last_start = &line[0]; 
    int num_parts = 0; 

    while(j < len) 
    { 
     if(line[j] == ',') 
     { 
      line[j] = '\0'; 

      if(num_parts == MAX_PARTS) { break; } 

      parts[num_parts] = atof(last_start); 
      j++; 
      num_parts++; 
      last_start = &line[j]; 
     } 
     j++; 
    } 

    /// do whatever you need with the parts[] array 
}

來源

2012-05-30 10:33:18

我不知道這是否會比接受的答案更快，但我還不如你想嘗試的情況下無論如何張貼。通過使用一些知道文件大小的文件，您可以使用一次讀取調用加載文件的全部內容fseek magic.這將比多次讀取調用快得多。

然後，您可以做這樣的事情來解析你的字符串：

//Delimited string to vector 
vector<string> dstov(string& str, string delimiter) 
{ 
    //Vector to populate 
    vector<string> ret; 
    //Current position in str 
    size_t pos = 0; 
    //While the the string from point pos contains the delimiter 
    while(str.substr(pos).find(delimiter) != string::npos) 
    { 
    //Insert the substring from pos to the start of the found delimiter to the vector 
    ret.push_back(str.substr(pos, str.substr(pos).find(delimiter))); 
    //Move the pos past this found section and the found delimiter so the search can continue 
    pos += str.substr(pos).find(delimiter) + delimiter.size(); 
    } 
    //Push back the final element in str when str contains no more delimiters 
    ret.push_back(str.substr(pos)); 
    return ret; 
} 

string rawfiledata; 

//This call will parse the raw data into a vector containing lines of 
//20.5,30.5,20.5,0.753669,3.91888,0.749897,1 by treating the newline 
//as the delimiter 
vector<string> lines = dstov(rawfiledata, "\n"); 

//You can then iterate over the lines and parse them into variables and do whatever you need with them. 
for(size_t itr = 0; itr < lines.size(); ++itr) 
    vector<string> line_variables = dstov(lines[itr], ",");

來源

2012-05-30 11:58:20 TVOHM

快速，簡單++ CSV解析用C

回答

相關問題