2014-02-24 28 views
0

所以我有一個任務,我讀了190萬行浮點值(每行360浮點數),我需要操作在那個數據上。最初我有這個工作,我不知道爲什麼我今天得到一個bad_alloc。這是我的代碼。我想知道是否有什麼我可以做的更好地優化它 - 我正在使用矢量,我希望我不必使用結構數組。我可以簡單地創建一個結構並創建一個結構數組,每個結構將保存x,y和一個浮點值數組。這會有什麼不同嗎?C++ std :: bad_alloc加載190萬行文件的浮點值 - 提供的源代碼

我真的很感激任何批評我的實現和代碼。謝謝!

實施

#include "file_parser.hpp" 

Parser::Parser(char* fname){ 
    fileName = fname; 
} 

/* 
* The function parses a file that was set in the constructor 
* And returns a map of the file 
*/ 
VectorsMap Parser::parseFile(){ 
    //open file 
    fPtr = fopen(fileName, "r"); 
    total_rows = 0; 
    line = (char*)malloc(sizeof(char)*LINE_MAX); 

    //parse the file line by line 
    while(fgets(line, LINE_MAX, fPtr)){ 
     //make sure that we do not read an empty line 
     if(line[0] != '\0') { 
      //send the line to be further parsed 
      parseLine(line); 
      //increment total rows count 
      total_rows++; 
     } 
    } 
    return vector_points; 
} 

void Parser::doCleanUp(){ 
    fclose(fPtr); 
    free(line); 
    vector_points.clear(); 
} 

/** 
* Parse a line and tokenize it 
* while extracting X and Y points 
* and vectors and put them in a VectorsMap(deifned in file_parser.h) 
*/ 
void Parser::parseLine(char* line){ 
    //collection of vectors. 
    std::vector<float> vectors; 
    char* point; 

    //grab the x and y tokens 
    char* tk1 = strtok(line, ","); 
    char* tk2 = strtok(NULL, ","); 

    //value for indexing 
    int i=0; 
    char* tmp; 

    //make sure we have two correct x and y points 
    if(tk1 == NULL || tk2 == NULL){ return; } 

    //convert the tokens to floats 
    float x = strtof(tk1, NULL); 
    float y = strtof(tk2, NULL); 

    //create the x and y pair used to insert vectors into the map 
    XYPair pair = XYPair(x, y); 

    //tokenize until end of line 
    while(point=strtok(NULL, ",")){ 
     //convert the token to float 
     float f_point = strtof(point, NULL); 
     //push the float to the vector 
     vectors.push_back(f_point); 
     i++; 
    } 
    //insert in the vectormap. 
    vector_points.insert(VectorsPair(pair, vectors)); 
} 

int Parser::getTotalRows(){ 
    return total_rows; 
} 

頭文件:

//create specific types to make my life easier later on 
typedef std::pair<float, float> XYPair; 

typedef std::pair<XYPair, std::vector<float> > VectorsPair; 
typedef std::map<XYPair, std::vector<float> > VectorsMap; 

class Parser{  
    public: 
     //constructor 
     Parser(char* fname); 
     VectorsMap parseFile(); 
     int getTotalRows(); 
     int row_values; 
     int total_rows; 
     void doCleanUp(); 
    private: 
     //collection of all x y points and their vectors 
     VectorsMap vector_points; 
     FILE* fPtr; //file pointer to file to be parsed 
     char *line; //line to parse file line by line 
     char* fileName; //path/name of file to be parsed 

     void parseLine(char* line); 
}; 
+1

「本來我有這個工作,我不知道爲什麼我今天得到一個bad_alloc」 - 所以有什麼改變?你的代碼,你的結構,輸入數據量?你真的需要一次性將它們全部讀入內存嗎? – Rup

+0

沒有真正改變,這就是問題......我想知道它是否取決於服務器的流量?我確實需要一次加載整個文件。系統應該能夠處理這個文件被加載到內存中。 –

+0

有多少人看到了這個問題的標題190萬。 「我*有*點擊。」 = P – WhozCraig

回答

0

調用mallocfopen當你調用parseFile但隨後在一個單獨的函數釋放是非常容易出錯。如果您撥打parseFile兩次而不致電doCleanup,則會泄漏內存和文件句柄。

我會停止使用mallocstrtok

VectorsMap Parser::parseFile(){ 
    //open file 
    std::ifstream f(fileName); 
    total_rows = 0; 
    std::string line; 

    //parse the file line by line 
    while(std::getline(f, line)){ 
     //make sure that we do not read an empty line 
     if(line.size()) { 
      //send the line to be further parsed 
      parseLine(line); 
      //increment total rows count 
      total_rows++; 
     } 
    } 
    return vector_points; 
} 

然後重寫parseLine不使用可怕strtok功能,例如使用Boost.Tokenizerstd::istringstreamstd::getline

另外請注意,您將數據讀入vector_points然後返回副本,這意味着你需要兩倍的數據集使用的內存。你只能通過這樣做保留一份數據:

return std::move(vector_points); 

所以數據被移入返回值而不是複製。

+0

這些都是我想將工作非常好的建議。謝謝 –