-1

我有兩個正則表達式匹配[value]和另一個匹配html屬性，但我需要將它們組合成一個單一的正則表達式。PHP的preg_replace在html中找到匹配，但如果它的html屬性不匹配

這是我的工作正則表達式找到[value]

$tagregexp = '[a-zA-Z_\-][0-9a-zA-Z_\-\+]{2,}'; 

    $pattern = 
      '\\['        // Opening bracket 
     . '(\\[?)'       // 1: Optional second opening bracket for escaping shortcodes: [[tag]] 
     . "($tagregexp)"      // 2: Shortcode name 
     . '(?![\\w-])'      // Not followed by word character or hyphen 
     . '('        // 3: Unroll the loop: Inside the opening shortcode tag 
     .  '[^\\]\\/]*'     // Not a closing bracket or forward slash 
     .  '(?:' 
     .   '\\/(?!\\])'    // A forward slash not followed by a closing bracket 
     .   '[^\\]\\/]*'    // Not a closing bracket or forward slash 
     .  ')*?' 
     . ')' 
     . '(?:' 
     .  '(\\/)'      // 4: Self closing tag ... 
     .  '\\]'       // ... and closing bracket 
     . '|' 
     .  '\\]'       // Closing bracket 
     .  '(?:' 
     .   '('      // 5: Unroll the loop: Optionally, anything between the opening and closing shortcode tags 
     .    '[^\\[]*+'    // Not an opening bracket 
     .    '(?:' 
     .     '\\[(?!\\/\\2\\])' // An opening bracket not followed by the closing shortcode tag 
     .     '[^\\[]*+'   // Not an opening bracket 
     .    ')*+' 
     .   ')' 
     .   '\\[\\/\\2\\]'    // Closing shortcode tag 
     .  ')?' 
     . ')' 
     . '(\\]?)';       // 6: Optional second closing bracket for escaping shortcodes: [[tag]]

example here

此正則表達式(\S+)=["']?((?:.(?!["']?\s+(?:\S+)=|[>"']))+.)["']?屬性和值相匹配。 example here

我想正則表達式來匹配在下面的例子

<div [value] ></div>
<div>[value]</div>

但不[value]找到匹配在這個例子中

<input attr="attribute[value]"/>

只是需要將它做成一個單一的正則表達式中使用我的preg_replace_callback

preg_replace_callback($pattern, replace_matches, $html);

來源

2016-05-17 TarranJones

你有沒有考慮使用一個解析器呢？ – chris85

它是PHP字符串，而不是Java字符串，你不需要全部轉義。使用x修飾符（如果可以使用nowdoc字符串），而不是使用連接。如果你想處理html（或xml），忘記regex並使用DOMDocument（最終DOMXPath）。 –

其他的事情，關閉方括號不是一個特殊的字符，你不需要逃避它。字符類中的方括號沒有什麼特別之處，你可以寫'[^ []'而不是'[^ \\ []''。 *（你甚至可以寫'[^]]和'[]]'，因爲在第一個位置，方括號被看作是一個文字字符。）* –

Foreward

在它看起來像你試圖解析HTML代碼與常規的表面表達。我覺得有必要指出，由於可能會出現所有可能的模糊邊緣情況，因此使用正則表達式來解析HTML是不可取的，但似乎您對HTML有一些控制權，因此您應該能夠避免使用許多正則表達式警察哭了。

說明

<\w+\s(?=(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\[(?<DesiredValue>[^\]]*)\]) 
| 
<\w+\s?(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*> 
(?:(?!<\/div>)(?!\[).)*\[(?<DesiredValue>[^\]]*)\]

Regular expression visualization

這個正則表達式將執行以下操作：方括號[some value]

是[value]內
- 捕獲子是在一個標籤
- 是[value]是不是一個標籤
- 提供子串的屬性區域內沒有嵌套在另一個值的ttributes <input attrib=" [value] ">
捕獲的子串將不包括包裹方括號
允許任何標籤名，或與所需的標籤名稱
允許value是任何字符串替換\w
難以避免邊緣情況

注：這個表達式最好用下列標誌使用：

全球
點匹配新行
忽略表達空白
允許重複的命名捕獲組

個

例子

現場演示

https://regex101.com/r/tT0bN5/1

示例文字

<div [value 1] ></div> 
<div>[value 2]</div> 
but not find a match in this example 

<div attr="attribute[value 3]"/> 
<img [value 4]> 
<a href="http://[value 5]">[value 6]</a>

樣品匹配

MATCH 1 
DesiredValue [6-13] `value 1` 
MATCH 2 
DesiredValue [29-36] `value 2` 
MATCH 3 
DesiredValue [121-128] `value 4` 
MATCH 4 
DesiredValue [159-166] `value 6`

說明

NODE      EXPLANATION 
---------------------------------------------------------------------- 
    <div      '<div' 
---------------------------------------------------------------------- 
    \s      whitespace (\n, \r, \t, \f, and " ") 
---------------------------------------------------------------------- 
    (?=      look ahead to see if there is: 
---------------------------------------------------------------------- 
    (?:      group, but do not capture (0 or more 
          times (matching the least amount 
          possible)): 
---------------------------------------------------------------------- 
     [^>=]     any character except: '>', '=' 
---------------------------------------------------------------------- 
    |      OR 
---------------------------------------------------------------------- 
     ='      '=\'' 
---------------------------------------------------------------------- 
     [^']*     any character except: ''' (0 or more 
           times (matching the most amount 
           possible)) 
---------------------------------------------------------------------- 
     '      '\'' 
---------------------------------------------------------------------- 
    |      OR 
---------------------------------------------------------------------- 
     ="      '="' 
---------------------------------------------------------------------- 
     [^"]*     any character except: '"' (0 or more 
           times (matching the most amount 
           possible)) 
---------------------------------------------------------------------- 
     "      '"' 
---------------------------------------------------------------------- 
    |      OR 
---------------------------------------------------------------------- 
     =      '=' 
---------------------------------------------------------------------- 
     [^'"]     any character except: ''', '"' 
---------------------------------------------------------------------- 
     [^\s>]*     any character except: whitespace (\n, 
           \r, \t, \f, and " "), '>' (0 or more 
           times (matching the most amount 
           possible)) 
---------------------------------------------------------------------- 
    )*?      end of grouping 
---------------------------------------------------------------------- 
    \[      '[' 
---------------------------------------------------------------------- 
    (      group and capture to \1: 
---------------------------------------------------------------------- 
     [^\]]*     any character except: '\]' (0 or more 
           times (matching the most amount 
           possible)) 
---------------------------------------------------------------------- 
    )      end of \1 
---------------------------------------------------------------------- 
    \]      ']' 
---------------------------------------------------------------------- 
)      end of look-ahead 
---------------------------------------------------------------------- 
|      OR 
---------------------------------------------------------------------- 
    <div      '<div' 
---------------------------------------------------------------------- 
    \s?      whitespace (\n, \r, \t, \f, and " ") 
          (optional (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
    (?:      group, but do not capture (0 or more times 
          (matching the most amount possible)): 
---------------------------------------------------------------------- 
    [^>=]     any character except: '>', '=' 
---------------------------------------------------------------------- 
    |      OR 
---------------------------------------------------------------------- 
    ='      '=\'' 
---------------------------------------------------------------------- 
    [^']*     any character except: ''' (0 or more 
          times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
    '      '\'' 
---------------------------------------------------------------------- 
    |      OR 
---------------------------------------------------------------------- 
    ="      '="' 
---------------------------------------------------------------------- 
    [^"]*     any character except: '"' (0 or more 
          times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
    "      '"' 
---------------------------------------------------------------------- 
    |      OR 
---------------------------------------------------------------------- 
    =      '=' 
---------------------------------------------------------------------- 
    [^'"]     any character except: ''', '"' 
---------------------------------------------------------------------- 
    [^\s>]*     any character except: whitespace (\n, 
          \r, \t, \f, and " "), '>' (0 or more 
          times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
)*      end of grouping 
---------------------------------------------------------------------- 
    >      '>' 
---------------------------------------------------------------------- 
    (?:      group, but do not capture (0 or more times 
          (matching the most amount possible)): 
---------------------------------------------------------------------- 
    (?!      look ahead to see if there is not: 
---------------------------------------------------------------------- 
     <      '<' 
---------------------------------------------------------------------- 
     \/      '/' 
---------------------------------------------------------------------- 
     div>      'div>' 
---------------------------------------------------------------------- 
    )      end of look-ahead 
---------------------------------------------------------------------- 
    (?!      look ahead to see if there is not: 
---------------------------------------------------------------------- 
     \[      '[' 
---------------------------------------------------------------------- 
    )      end of look-ahead 
---------------------------------------------------------------------- 
    .      any character 
---------------------------------------------------------------------- 
)*      end of grouping 
---------------------------------------------------------------------- 
    \[      '[' 
---------------------------------------------------------------------- 
    (      group and capture to \2: 
---------------------------------------------------------------------- 
    [^\]]*     any character except: '\]' (0 or more 
          times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
)      end of \2 
---------------------------------------------------------------------- 
    \]      ']'

來源

2016-05-18 02:26:43

令人難以置信的答案，我很欣賞投入到答案中的時間和精力。我仍然沒有完全解決它，但這應該有很大的幫助。 – TarranJones

讓我知道這個答案是缺少的，或者我可以幫忙。 –

PHP的preg_replace在html中找到匹配，但如果它的html屬性不匹配

回答

Foreward

說明

例子

說明

相關問題