在C++中的正則表達式？

我期待在C++/STL中構建一個簡單的flex分析器。爲此，我想從左到右掃描一個字符串，並在每次從一組正則表達式中提取最大可能的正則表達式。在C++中的正則表達式？

我不太清楚該怎麼做。問題實際上並不是編譯正則表達式或使用正則表達式，但我不確定提取最長正則表達式的「高級」循環需要如何。

任何提示都會很好。它不一定是明確的代碼，而只是一些指針和想法。

編輯：感謝指針，以提升正則表達式庫。我沒有意識到這一點。

這裏是提取電子郵件地址的示例代碼：

std::string html = …; 
    regex mailto("<a href=\"mailto:(.*?)\">", regex_constants::icase); 
    sregex_iterator begin(html.begin(), html.end(), mailto), end; 

    for (; begin != end; ++begin) 
    { 
     smatch const & what = *begin; 
     std::cout << "Email address to spam: " << what[1] << "\n"; 
    }

我要的略有不同。

例如，我想要一個額外的正則表達式，它可以找到http：//地址，也可以找到所有大寫字符串。

std::string html = …; 
    regex mailto("<a href=\"mailto:(.*?)\">", regex_constants::icase); 
    regex http(....); 
    regex all_caps("...", regex_constants::icase); 
    // the actual definitions of the regular expressions do not matter, I can find how to do that later. 


    // Here, I would like to iterate, and find concurrently the matching patterns from all three regular expressions above 
    sregex_iterator begin(html.begin(), html.end(), mailto), end; 

    for (; begin != end; ++begin) 
    { 
     smatch const & what = *begin; 
     // here I should be able to identify which among the above three was found 
     std::cout << "Email address to spam: " << what[1] << "\n"; 
    }

最後，我應該始終能夠匹配至少一個正則表達式，並且直到我到達字符串的末尾。

來源

2014-03-06 kloop

太模糊。如何處理一些代碼，或僞代碼或樣本輸入和輸出？ – aschepler

爲什麼不使用Boost.Spirit.Lex或Boost.Xpressive等現有工具？ –

我需要一些例子，你的解釋不清楚。我對正則表達式不夠熟悉，沒有100％的置信度，但我認爲唯一足以作爲正則表達式的是一個無效的字符。例如，這整個評論是一個有效的正則表達式。鑑於之前的角色，沒有角色是非法的。但你不需要預見。 – MSalters

你有一組名爲A..Z的正則表達式。你有一個字符輸入流。您需要將每個正則表達式（A..Z）編譯成狀態機（a..z）。然後，您需要將這些獨立的狀態機組合成一個狀態機（最終）。最終狀態機中的每個狀態都與a..z狀態機中的一個或多個狀態集相匹配。狀態機是節點（狀態）和邊（輸入字符）的集合。例如表述「AB」的狀態機匹配帶有3個節點

（空字符串，「一」輸入，「AB」輸入）

和2的邊緣的「a」和「b」。

與表達式「cd」類似。

當你結合這兩種狀態機你：

（空字符串，「A」進入，「AB」進入，「C」進入，「CD」輸入）。

聽起來可行嗎？

來源

2014-03-06 22:18:56

在C++中的正則表達式？

回答

相關問題