在C++中的解析器/拆分常量字符*

我試圖找到一個解決方案，但我沒有找到任何解決我的問題。在C++中的解析器/拆分常量字符*

我有一個C++程序，收到const char*變量（filedata）和大小（filesize）。該變量的內容是csv格式。每個字段由';'分隔。內容也是動態的，並且可能有更多或更少的內容，因爲這個變量表示一組日誌。還有一個分隔符\n來表示換行符。

例FILEDATA的1：fildedata的

const char* filedata = 
    "1496843100;2017-06-07 13:45:00;000002D8;2800;0x23000CCD.VARIABLE67\n" 
    "1496843100;2017-06-07 13:45:00;000002D9;2800;0x23000CCD.VARIABLE68";

實施例2：

const char* filedata = 
    "1496843100;2017-06-07 13:45:00;000002D8;2800;0x23000CCD.VARIABLE67\n" 
    "1496843100;2017-06-07 13:45:00;000002D9;2800;0x23000CCD.VARIABLE68\n" 
    "1496843100;2017-06-07 13:45:00;000002DA;2800;0x23000CCD.VARIABLE69";

如果看到實施例1僅具有2行，和實施例2中有3條線。我從來不知道我有多少行。我可以有2，3，200，1000等行和filedata變量保存所有內容。

所以我的目標是要接收該filedata可變（I還可以訪問文件大小）和每行我需要解析領域1和2（時間戳和在正常格式的數據）。

預期輸出（對於實施例2）：

1496843100 2017-06-07 13:45:00 
1496843100 2017-06-07 13:45:00 
1496843100 2017-06-07 13:45:00

在實施例2我有3條線，所以需要迭代所有行和每行解析的具體領域，非常相似的輸出。。在此之後我挑每個語法分析器字段並保存到對象列表（此部分已實現我只是遇到問題解析filedata

來源

2017-06-08 rrpik93

[？拆分C++中的字符串（的可能的複製https://stackoverflow.com/questions/236129/split-a-海峽ing-in-c） –

感謝您的回覆。是的，它幫助我部分分隔';'。但我不能編輯代碼只是爲了削減我想要的每一行的「列」。在這個例子中，我用「;」分隔所有的列，但我不想要所有的列。 – rrpik93

只需拆分並忽略不需要的列。 –

-1

快速的解決方案：可以使用類似的方法來爆炸（）函數從PHP，這是回答如何在C++ enter link description here爆炸功能。也許你將不得不修改回答代碼，以standard C string作爲輸入。

然後，如果你將擁有自己的爆炸（）函數的版本，你可以這樣做std::vector<std::string> lines = explode(filedata,'\n') 。

下一步將爲每個行元素執行std::vector<std::string> line_elements = explode(lines[i], ';')。然後你會有每個單獨的領域，你可以打印/解析你想要什麼。

來源

2017-06-08 10:08:33

請重新閱讀您的問題並解決它。請用英文寫，_sth_不是英文單詞。並刪除更長的解決方案，在C++程序中使用C字符串函數是一個壞主意。 –

你可以使用這個表達式

const char *regex_str = "\\d{10};[\\d,-]{10} [\\d,:]{8}"; //verified in http://regexr.com/

，然後從輸入const char *找到所有的正則表達式 - 從finding all regex得到幫助 - 爲Windows。

在mac os中std :: regex可能無法直接工作。需要在命令行中添加-stdlib=libc++

來源

2017-06-08 10:58:33

這裏是你想要的輸出的工作代碼。我用這個SO answer來解決我在我的複製標記中引用的SO問題。我修改了它，這樣新行字符\n也作爲分隔符。因此在代碼中有兩個while循環。

您必須將想要的列數（cols）傳遞給split()函數。您也可以（可選）通過應排除的列（filtCol）。代碼下面的示例使用：cols = 5和filtCols = (1 << 1) | (1 << 3)，這意味着除了第2列和第4列以外，所有五列都應該被解析。因此，第1,3,5列僅位於結果向量中。我使用了一個位模式，因爲它的評估速度比列表/數組數組快。

#include <string> 
#include <sstream> 
#include <vector> 
#include <iterator> 
#include <iostream> 

template<typename Out> 
void split(const std::string& s, char delim, size_t cols, size_t filtCol, Out result) 
{ 
    std::stringstream ss; 
    ss.str(s); 
    std::string item; 

    /* Two while loops two separate on new line first */ 
    while (std::getline(ss, item)) 
    { 
     std::stringstream ssLine; 
     ssLine.str(item); 
     std::string itemLine; 

     /* Parse line and separate */ 
     size_t curCol = 0; 
     while (std::getline(ssLine, itemLine, delim)) 
     { 
     /* Just add column is in range and is not excluded by */ 
     /* bit pattern!          */ 
     if (curCol < cols && (~filtCol & (1 << curCol))) 
     { 
      *(result++) = itemLine; 
     } 

     ++curCol; 
     } 
    } 
} 

std::vector<std::string> split(const std::string& s, char delim, size_t cols, size_t filtCol = 0) 
{ 
    std::vector<std::string> elems; 
    split(s, delim, cols, filtCol, std::back_inserter(elems)); 
    return elems; 
} 

/* Example usage */ 
int main() 
{ 
    const char* filedataI = 
     "1496843100;2017-06-07 13:45:00;000002D8;2800;0x23000CCD.VARIABLE67\n" 
     "1496843100;2017-06-07 13:45:00;000002D9;2800;0x23000CCD.VARIABLE68\n" 
     "1496843100;2017-06-07 13:45:00;000002DA;2800;0x23000CCD.VARIABLE69"; 

    size_t colsRange = 5; /* Parse from col 1 to 5 (all five) */ 
    size_t colsFiltered = (1 << 1) | (1 << 3); /* Exclude col 2 and 4 */ 
    size_t colsPerLine = 3; /* 5 - 2 = 3 */ 

    std::vector<std::string> strVecI = split(filedataI, ';', colsRange, colsFiltered); 
    for (size_t idx = 0; idx < strVecI.size(); ++idx) 
    { 
     if (idx > 0 && 0 == idx % colsPerLine) 
     { 
     std::cout << std::endl; 
     } 
     std::cout << "\"" << strVecI[idx] << "\" " << " "; 
    } 
}

輸出與3列通緝（5用2除外：cols = 5和filtCols = (1 << 1) | (1 << 3)），我也印刷另外的"和三個空間在之間：和

"1496843100" "000002D8" "0x23000CCD.VARIABLE67" 
"1496843100" "000002D9" "0x23000CCD.VARIABLE68" 
"1496843100" "000002DA" "0x23000CCD.VARIABLE69"

來源

2017-06-08 11:28:47

感謝您的回覆和示例代碼。它正在工作。但在我的情況下，它不是我想要的東西。在你的代碼中，如果我在'colsWanted'中放置了4，我們將前4列分開，如果我把2分割成前2列，但是如果我只想要1列和3列，我需要做什麼？因爲如果我在'colsWanted'中放置3，我們將1,2和3列分開。再次感謝 – rrpik93

然後只是不訪問不想要的列。或者改變它可以跳過列的程序。 –

在我的問題中，我展示了獲取第一列和第二列的示例，爲此，您的代碼運行良好。但如果我想第一，第二和第四列它不工作 – rrpik93

使用<regex>庫
regex_token_iterator如分離器

首先與\n分開並用;

的代碼：

const char* filedata = 
    "1496843100;2017-06-07 13:45:00;000002D8;2800;0x23000CCD.VARIABLE67\n" 
    "1496843100;2017-06-07 13:45:00;000002D9;2800;0x23000CCD.VARIABLE68\n" 
    "1496843100;2017-06-07 13:45:00;000002DA;2800;0x23000CCD.VARIABLE69"; 

const char* begin_f = filedata; 
const char* end___f = filedata + std::string(filedata).size(); 

/* first of all split by newline */ 

std::vector<std::string> vec_str; 
std::regex regex1("\n"); 
std::regex regex2(";"); 

std::regex_token_iterator< const char* > first(begin_f, end___f, regex1, -1), last; 
vec_str.assign(first, last); 

for(std::string str1 : vec_str){ 

    /* then split by semicolon ; */ 
    std::regex_token_iterator<std::string::const_iterator> first(str1.begin(),str1.end(), regex2, -1), last; 
    int counter = 2; 
    while(first != last && counter--){ 
     std::cout << *first++ << " "; 
    } 
    std::cout << '\n'; 

}

輸出：

1496843100 2017-06-07 13:45:00 
1496843100 2017-06-07 13:45:00 
1496843100 2017-06-07 13:45:00

來源

2017-06-08 11:29:11

下面是使用該std::find()應該是相當快速，高效的解決方案。這個想法是你有一個外循環，找到結束'\n'每個連續的線和一個內循環，認爲（該範圍內的）每個連續場結束3210

在兩個循環的心臟，你有機會做任何你像與所述列：

char const* filedata = 
    "1496843100;2017-06-07 13:45:00;000002D8;2800;0x23000CCD.VARIABLE67\n" 
    "1496843100;2017-06-07 13:45:00;000002D9;2800;0x23000CCD.VARIABLE68\n" 
    "1496843100;2017-06-07 13:45:00;000002DA;2800;0x23000CCD.VARIABLE69"; 

auto filesize = std::strlen(filedata); 

auto line_beg = filedata; 
auto line_end = filedata + filesize; 

for(; auto line_pos = std::find(line_beg, line_end, '\n'); line_beg = line_pos + 1) 
{ 
    auto field_beg = line_beg; 
    auto field_end = line_pos; 

    auto field_number = 0U; 
    for(; auto field_pos = std::find(field_beg, field_end, ';'); field_beg = field_pos + 1) 
    { 
     ++field_number; 

     // select the field number you want here 
     if(field_number == 1 || field_number == 2) 
     { 
      // do something with the field that starts at field_beg 
      // and ends at field_pos 
      std::cout << ' ' << std::string(field_beg, field_pos); 
     } 

     if(field_pos == field_end) 
      break; 
    } 

    std::cout << '\n'; 

    if(line_pos == line_end) 
     break; 
}

輸出：

1496843100 2017-06-07 13:45:00 
1496843100 2017-06-07 13:45:00 
1496843100 2017-06-07 13:45:00

來源

2017-06-08 21:56:16 Galik

在C++中的解析器/拆分常量字符*

回答

相關問題