如何僅匹配包含不相等組的表達式？

我想通過使用正則表達式只捕獲給定組中差異的表達式。比如我需要捕捉這些（粗體）：如何僅匹配包含不相等組的表達式？

;TEXT;2;34;1;0;;;;;;3200;
PRINT_Polohopis.dgn;Different TEXT;2;64;1;0;;;;;;3200;

但不是這些（如果是同一個）：

;TEXT;2;34;1;0;;;;;;3200;
PRINT_Polohopis.dgn;TEXT;2;64;1;0;;;;;;3200;

到目前爲止我設法創建了這個正則表達式：

^;([\w\s]*;).*\n(?:[\w\s_\.]*);(?:(?!(\1))(\K[\w\s]*;))

只有在捕獲組中包含分號時纔有效。是否有可能以更好的方式捕捉這些羣體？

來源

2016-06-10 Pavel Vicha

這樣的事情可能會爲你工作：

/^;([^;]+);.*?\n[^;]+;(?!\1;)([^;]+)/

Try it online

這裏的竅門是負lookthahead被用於確保\1（後向引用）是不是在希望的位置：

/^;        /# Start of string and literal ; 
    ([^;]+);       # Capture all but ; followed by literal ; 
      .*?\n      # Match rest of line 
       [^;]+;    # Match all but ; followed by literal ; 
         (?!\1;)   # Negative lookahead to make sure captured 
             # group is no at this position, followed 
             # by literal ; 
          ([^;]+) # Capture all but ;

來源

2016-06-10 12:11:39 andlrc

仍然有一個錯誤 - 如果差異在第二個捕獲組的末尾（例如TEXT和TEXT1）。 –

@PavelVicha這是真的，它很容易修復，只需使用'（？！\ 1;）'而不是'（？！\ 1）' – andlrc

好，謝謝！ –

如何僅匹配包含不相等組的表達式？

回答

相關問題