正則表達式跳過C++

這是我的字符串：正則表達式跳過C++

/* 
    Block1 { 

    anythinghere 
    } 
*/ 

// Block2 { } 
# Block3 { } 

Block4 { 

    anything here 
}

我使用這個正則表達式來獲得每個塊的名稱和內部的內容。

regex e(R"~((\w+)\s+\{([^}]+)\})~", std::regex::optimize);

但是這個正則表達式也得到了所有的描述。 PHP中有一個「跳過」選項，您可以使用它跳過所有描述。

What_I_want_to_avoid(*SKIP)(*FAIL)|What_I_want_to_match

但是，這是C++，我不能使用這種跳過方法。我應該怎麼做才能跳過所有的描述，並在C++ regex中獲取Block4？

此正則表達式檢測Block1，Block2，Block3和Block4但我想跳過Block1，Block2，Block3和剛剛獲得Block4（跳過說明）。我該如何編輯我的正則表達式才能得到Block4（描述之外的所有內容）？

來源

2016-02-24 BasicYard

它看起來像您嘗試ACC使用正則表達式來刪除某些東西，這些東西應該由解析器完成。話雖如此，從你的問題來看，你實際上想要匹配的東西並不完全清楚。 –

「跳過所有描述」是什麼意思？你是否想要匹配評論？ –

是試着不匹配的評論 – BasicYard

既然你要求這個漫長的正則表達式，就在這裏。

這不會處理嵌套塊像block{ block{ } }
它只會匹配block{ block{ }}。

由於您指定您使用C++ 11作爲引擎，因此我沒有使用
遞歸。如果要使用
PCRE或Perl，或者甚至是BOOST :: Regex，則可以輕鬆更改此遞歸。讓我知道你是否想看到這一點。

因爲它有缺陷，但適用於您的示例。
另一件事也不會做的是分析預處理器指令「＃...」因爲
我忘記了這些規則（認爲我最近做了，卻找不到一條記錄）。

要使用它，坐在while (regex_search())循環尋找匹配
捕獲組1，if (m[1].success)等。這將是你的塊。
其餘的比賽是用於評論，報價或非評論，無關
到塊。這些必須匹配才能提升比賽位置。

代碼是長並且是多餘的，因爲在C++ 11 EMCAscript中沒有函數調用（遞歸）。就像我說的，使用boost :: regex或其他東西。

基準

樣品：

/* 
    Block1 { 

    anythinghere 
    } 
*/ 

// Block2 { } 

Block4 { 

    // CommentedBlock{ asdfasdf } 
    anyth"}"ing here 
} 

Block5 { 

    /* CommentedBlock{ asdfasdf } 
    anyth}"ing here 
    */ 
}

結果：

Regex1: (?:/\*[^*]*\*+(?:[^/*][^*]*\*+)*/|//(?:[^\\]|\\\n?)*?\n)|(?:"[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'|(\w+\s*\{(?:(?:/\*[^*]*\*+(?:[^/*][^*]*\*+)*/|//(?:[^\\]|\\\n?)*?\n)|(?:"[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'|(?!\})[\S\s][^}/"'\\]*))*\})|[\S\s](?:(?!\w+\s*\{(?:(?:/\*[^*]*\*+(?:[^/*][^*]*\*+)*/|//(?:[^\\]|\\\n?)*?\n)|(?:"[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'|(?!\})[\S\s][^}/"'\\]*))*\})[^/"'\\])*) 
Options: <none> 
Completed iterations: 50/50  (x 1000) 
Matches found per iteration: 8 
Elapsed Time: 1.95 s, 1947.26 ms, 1947261 µs

正則表達式解釋：

# Raw:  (?:/\*[^*]*\*+(?:[^/*][^*]*\*+)*/|//(?:[^\\]|\\\n?)*?\n)|(?:"[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'|(\w+\s*\{(?:(?:/\*[^*]*\*+(?:[^/*][^*]*\*+)*/|//(?:[^\\]|\\\n?)*?\n)|(?:"[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'|(?!\})[\S\s][^}/"'\\]*))*\})|[\S\s](?:(?!\w+\s*\{(?:(?:/\*[^*]*\*+(?:[^/*][^*]*\*+)*/|//(?:[^\\]|\\\n?)*?\n)|(?:"[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'|(?!\})[\S\s][^}/"'\\]*))*\})[^/"'\\])*) 
    # Stringed: "(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/|//(?:[^\\\\]|\\\\\\n?)*?\\n)|(?:\"[^\"\\\\]*(?:\\\\[\\S\\s][^\"\\\\]*)*\"|'[^'\\\\]*(?:\\\\[\\S\\s][^'\\\\]*)*'|(\\w+\\s*\\{(?:(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/|//(?:[^\\\\]|\\\\\\n?)*?\\n)|(?:\"[^\"\\\\]*(?:\\\\[\\S\\s][^\"\\\\]*)*\"|'[^'\\\\]*(?:\\\\[\\S\\s][^'\\\\]*)*'|(?!\\})[\\S\\s][^}/\"'\\\\]*))*\\})|[\\S\\s](?:(?!\\w+\\s*\\{(?:(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/|//(?:[^\\\\]|\\\\\\n?)*?\\n)|(?:\"[^\"\\\\]*(?:\\\\[\\S\\s][^\"\\\\]*)*\"|'[^'\\\\]*(?:\\\\[\\S\\s][^'\\\\]*)*'|(?!\\})[\\S\\s][^}/\"'\\\\]*))*\\})[^/\"'\\\\])*)"  


    (?:        # Comments 
     /\*        # Start /* .. */ comment 
     [^*]* \*+ 
     (?: [^/*] [^*]* \*+)* 
     /        # End /* .. */ comment 
     | 
     //        # Start // comment 
     (?: [^\\] | \\ \n?)*?   # Possible line-continuation 
     \n        # End // comment 
    ) 
|         # OR, 

    (?:        # Non - comments 
     " 
     [^"\\]*       # Double quoted text 
     (?: \\ [\S\s] [^"\\]*)* 
     " 
     | ' 
     [^'\\]*       # Single quoted text 
     (?: \\ [\S\s] [^'\\]*)* 
     ' 
     | 
     (        # (1 start), BLOCK 
       \w+ \s* \{    
       #################### 
       (?:        # ------------------------ 
        (?:        # Comments inside a block 
         /\*        
         [^*]* \*+ 
         (?: [^/*] [^*]* \*+)* 
         /        
        | 
         //        
         (?: [^\\] | \\ \n?)*? 
         \n        
        ) 
       | 
        (?:        # Non - comments inside a block 
         " 
         [^"\\]*       
         (?: \\ [\S\s] [^"\\]*)* 
         " 
        | ' 
         [^'\\]*       
         (?: \\ [\S\s] [^'\\]*)* 
         ' 
        | 
         (?! \}) 
         [\S\s]       
         [^}/"'\\]*      
        ) 
      )*        # ------------------------ 
       #####################   
       \}        
     )        # (1 end), BLOCK 

     |         # OR, 

     [\S\s]       # Any other char 
     (?:        # ------------------------- 
       (?!        # ASSERT: Here, cannot be a BLOCK{ } 
        \w+ \s* \{      
        (?:        # ============================== 
         (?:        # Comments inside a block 
          /\*        
          [^*]* \*+ 
          (?: [^/*] [^*]* \*+)* 
          /        
          | 
          //        
          (?: [^\\] | \\ \n?)*? 
          \n        
         ) 
        | 
         (?:        # Non - comments inside a block 
          " 
          [^"\\]*       
          (?: \\ [\S\s] [^"\\]*)* 
          " 
          | 
          ' 
          [^'\\]*       
          (?: \\ [\S\s] [^'\\]*)* 
          ' 
          | 
          (?! \}) 
          [\S\s]       
          [^}/"'\\]*      
         ) 
        )*        # ============================== 
        \}        
      )        # ASSERT End 

       [^/"'\\]       # Char which doesn't start a comment, string, escape, 
               # or line continuation (escape + newline) 
     )*        # ------------------------- 
    )        # Done Non - comments

來源

2016-02-25 21:06:31 sln

T1; DR：Regular expressions cannot be used to parse full blown computer languages。你想做的事不能用正則表達式來完成。您需要開發一個迷你C++解析器來過濾註釋。 The answer to this related question might point you in the right direction。

正則表達式可用於處理regular expressions，但計算機語言（如C++，PHP，Java，C＃，HTML等）具有更復雜的語法，其中包含名爲「中間遞歸」的屬性。中間遞歸包括諸如任意數量的匹配括號，開始/結束引號以及可以包含符號的註釋之類的複雜性

如果您想更詳細地瞭解這一點，請參閱read the answers to this question about the difference between regular expressions and context free grammars。如果您真的好奇，請註冊Formal Language Theory課程。

來源

2016-02-24 18:59:31

正則表達式跳過C++

回答

相關問題