2016-05-13 65 views
1

如果第一個左括號後面跟着關鍵字array,如何替換一組匹配的開啓/關閉括號?正則表達式可以幫助解決這類問題嗎?RegExp替換嵌套結構中的匹配括號

爲了更具體,我想解決這個使用JavaScript或PHP

// input 
$data = array(
    'id' => nextId(), 
    'profile' => array(
     'name' => 'Hugo Hurley', 
     'numbers' => (4 + 8 + 15 + 16 + 23 + 42)/108 
    ) 
); 

// desired output 
$data = [ 
    'id' => nextId(), 
    'profile' => [ 
     'name' => 'Hugo Hurley', 
     'numbers' => (4 + 8 + 15 + 16 + 23 + 42)/108 
    ] 
]; 
+0

這不是語言阿格諾就正則表達式而言,這是至關重要的。沒有太多的正則表達式支持你需要的遞歸。那麼,您正在使用哪個正則表達式引擎? –

+0

不幸的是,正則表達式語法和功能不是語言不可知的。 –

+0

通過函數調用的平衡文本可以在PCRE/Perl中完成。平衡組(計數)可以在Dot-Net中完成。那些是你的選擇。對於pcre(我認爲是Dot-Net),你只能得到整個文本的嵌套。爲了進一步細分,你必須遞歸地解析它的內容。 – sln

回答

3

Tim Pietzcker給出了Dot-Net計數版本。
它與下面的PCRE(php)版本具有相同的元素。

所有的注意事項都是一樣的。特別是,非數組括號必須
應平衡,因爲它們使用與分隔符相同的右括號。

所有文本都必須解析(或應該)。
外組1,2,3,4讓你得到部分
內容
CORE-1 array()
CORE-2的任何()
例外

每場比賽讓你的這些外的事情之一併相互排斥。

訣竅是定義一個解析CORE的php 函數parse(core)
該函數的內部是while (regex.search(core) { .. }循環。

每次任一CORE-1或2的基團匹配,調用函數parse(core)傳遞
該芯的組的內容到它。

而在循環內部,只需取下內容並將其分配給散列。

明顯地,應該使用構造來替換調用(?&content)的組1構造以獲得像變量數據那樣的散列。

在詳細的範圍內,這可能非常乏味。
通常情況下,你必須考慮每一個字符才能正確地解析整個事物。

(?is)(?:((?&content))|(?>\barray\s*\()((?=.)(?&core)|)\)|\(((?=.)(?&core)|)\)|(\barray\s*\(|[()]))(?(DEFINE)(?<core>(?>(?&content)|(?>\barray\s*\()(?:(?=.)(?&core)|)\)|\((?:(?=.)(?&core)|)\))+)(?<content>(?>(?!\barray\s*\(|[()]).)+)) 

的擴展性

# 1: CONTENT 
# 2: CORE-1 
# 3: CORE-2 
# 4: EXCEPTIONS 

(?is) 

(?: 
     (         # (1), Take off CONTENT 
      (?&content) 
    ) 
    |         # OR ----------------------------- 
     (?>        # Start 'array(' 
      \b array \s* \(
    ) 
     (         # (2), Take off 'array(CORE-1)' 
      (?= .) 
      (?&core) 
     | 
    ) 
     \)         # End ')' 
    |         # OR ----------------------------- 
     \(        # Start '(' 
     (         # (3), Take off '(any CORE-2)' 
      (?= .) 
      (?&core) 
     | 
    ) 
     \)         # End ')' 
    |         # OR ----------------------------- 
     (         # (4), Take off Unbalanced or Exceptions 
      \b array \s* \(
     | [()] 
    ) 
) 

# Subroutines 
# --------------- 

(?(DEFINE) 

     # core 
     (?<core> 
      (?> 
       (?&content) 
      | 
       (?> \b array \s* \() 
       # recurse core of array() 
       (?: 
        (?= .) 
        (?&core) 
        | 
       ) 
       \) 
      | 
       \(
       # recurse core of any () 
       (?: 
        (?= .) 
        (?&core) 
        | 
       ) 
       \) 
      )+ 
    ) 

     # content 
     (?<content> 
      (?> 
       (?! 
        \b array \s* \(
        | [()] 
       ) 
       . 
      )+ 
    ) 
) 

輸出

** Grp 0   - (pos 0 , len 11) 
some_var = 
** Grp 1   - (pos 0 , len 11) 
some_var = 
** Grp 2   - NULL 
** Grp 3   - NULL 
** Grp 4 [core] - NULL 
** Grp 5 [content] - NULL 

----------------------- 

** Grp 0   - (pos 11 , len 153) 
array(
    'id' => nextId(), 
    'profile' => array(
     'name' => 'Hugo Hurley', 
     'numbers' => (4 + 8 + 15 + 16 + 23 + 42)/108 
    ) 
) 
** Grp 1   - NULL 
** Grp 2   - (pos 17 , len 146) 

    'id' => nextId(), 
    'profile' => array(
     'name' => 'Hugo Hurley', 
     'numbers' => (4 + 8 + 15 + 16 + 23 + 42)/108 
    ) 

** Grp 3   - NULL 
** Grp 4 [core] - NULL 
** Grp 5 [content] - NULL 

------------------------------------- 

** Grp 0   - (pos 164 , len 3) 
; 

** Grp 1   - (pos 164 , len 3) 
; 

** Grp 2   - NULL 
** Grp 3   - NULL 
** Grp 4 [core] - NULL 
** Grp 5 [content] - NULL 

別的東西前世,讓使用的想法

# Perl code: 
# 
#  use strict; 
#  use warnings; 
#  
#  use Data::Dumper; 
#  
#  $/ = undef; 
#  my $content = <DATA>; 
#  
#  # Set the error mode on/off here .. 
#  my $BailOnError = 1; 
#  my $IsError = 0; 
#  
#  my $href = {}; 
#  
#  ParseCore($href, $content); 
#  
#  #print Dumper($href); 
#  
#  print "\n\n"; 
#  print "\nBase======================\n"; 
#  print $href->{content}; 
#  print "\nFirst======================\n"; 
#  print $href->{first}->{content}; 
#  print "\nSecond======================\n"; 
#  print $href->{first}->{second}->{content}; 
#  print "\nThird======================\n"; 
#  print $href->{first}->{second}->{third}->{content}; 
#  print "\nFourth======================\n"; 
#  print $href->{first}->{second}->{third}->{fourth}->{content}; 
#  print "\nFifth======================\n"; 
#  print $href->{first}->{second}->{third}->{fourth}->{fifth}->{content}; 
#  print "\nSix======================\n"; 
#  print $href->{six}->{content}; 
#  print "\nSeven======================\n"; 
#  print $href->{six}->{seven}->{content}; 
#  print "\nEight======================\n"; 
#  print $href->{six}->{seven}->{eight}->{content}; 
#  
#  exit; 
#  
#  
#  sub ParseCore 
#  { 
#   my ($aref, $core) = @_; 
#   my ($k, $v); 
#   while ($core =~ /(?is)(?:((?&content))|(?><!--block:(.*?)-->)((?&core)|)<!--endblock-->|(<!--(?:block:.*?|endblock)-->))(?(DEFINE)(?<core>(?>(?&content)|(?><!--block:.*?-->)(?:(?&core)|)<!--endblock-->)+)(?<content>(?>(?!<!--(?:block:.*?|endblock)-->).)+))/g) 
#   { 
#   if (defined $1) 
#   { 
#    # CONTENT 
#    $aref->{content} .= $1; 
#   } 
#   elsif (defined $2) 
#   { 
#    # CORE 
#    $k = $2; $v = $3; 
#    $aref->{$k} = {}; 
#  #   $aref->{$k}->{content} = $v; 
#  #   $aref->{$k}->{match} = $&; 
#     
#    my $curraref = $aref->{$k}; 
#    my $ret = ParseCore($aref->{$k}, $v); 
#    if ($BailOnError && $IsError) { 
#     last; 
#    } 
#    if (defined $ret) { 
#     $curraref->{'#next'} = $ret; 
#    } 
#   } 
#   else 
#   { 
#    # ERRORS 
#    print "Unbalanced '$4' at position = ", $-[0]; 
#    $IsError = 1; 
#  
#    # Decide to continue here .. 
#    # If BailOnError is set, just unwind recursion. 
#    # ------------------------------------------------- 
#    if ($BailOnError) { 
#     last; 
#    } 
#   } 
#   } 
#   return $k; 
#  } 
#  
#  #================================================ 
#  __DATA__ 
#  some html content here top base 
#  <!--block:first--> 
#   <table border="1" style="color:red;"> 
#   <tr class="lines"> 
#    <td align="left" valign="<--valign-->"> 
#   <b>bold</b><a href="http://www.mewsoft.com">mewsoft</a> 
#   <!--hello--> <--again--><!--world--> 
#   some html content here 1 top 
#   <!--block:second--> 
#    some html content here 2 top 
#    <!--block:third--> 
#     some html content here 3 top 
#     <!--block:fourth--> 
#      some html content here 4 top 
#      <!--block:fifth--> 
#       some html content here 5a 
#       some html content here 5b 
#      <!--endblock--> 
#     <!--endblock--> 
#     some html content here 3a 
#     some html content here 3b 
#    <!--endblock--> 
#    some html content here 2 bottom 
#   <!--endblock--> 
#   some html content here 1 bottom 
#  <!--endblock--> 
#  some html content here1-5 bottom base 
#  
#  some html content here 6-8 top base 
#  <!--block:six--> 
#   some html content here 6 top 
#   <!--block:seven--> 
#    some html content here 7 top 
#    <!--block:eight--> 
#     some html content here 8a 
#     some html content here 8b 
#    <!--endblock--> 
#    some html content here 7 bottom 
#   <!--endblock--> 
#   some html content here 6 bottom 
#  <!--endblock--> 
#  some html content here 6-8 bottom base 
# 
# Output >> 
# 
#  Base====================== 
#  some html content here top base 
#  
#  some html content here1-5 bottom base 
#  
#  some html content here 6-8 top base 
#  
#  some html content here 6-8 bottom base 
#  
#  First====================== 
#  
#   <table border="1" style="color:red;"> 
#   <tr class="lines"> 
#    <td align="left" valign="<--valign-->"> 
#   <b>bold</b><a href="http://www.mewsoft.com">mewsoft</a> 
#   <!--hello--> <--again--><!--world--> 
#   some html content here 1 top 
#   
#   some html content here 1 bottom 
#  
#  Second====================== 
#  
#    some html content here 2 top 
#    
#    some html content here 2 bottom 
#   
#  Third====================== 
#  
#     some html content here 3 top 
#     
#     some html content here 3a 
#     some html content here 3b 
#    
#  Fourth====================== 
#  
#      some html content here 4 top 
#      
#     
#  Fifth====================== 
#  
#       some html content here 5a 
#       some html content here 5b 
#      
#  Six====================== 
#  
#   some html content here 6 top 
#   
#   some html content here 6 bottom 
#  
#  Seven====================== 
#  
#    some html content here 7 top 
#    
#    some html content here 7 bottom 
#   
#  Eight====================== 
#  
#     some html content here 8a 
#     some html content here 8b 
#   
+0

雖然這還有很多,但爲了示例的目的,它是虛構的。在正則表達式解決方案中製作一個合適的解析器是一件很麻煩的事情,而不是心靈的佯攻。 – sln

2

如何以下(使用.NET正則表達式引擎):

resultString = Regex.Replace(subjectString, 
    @"\barray\(   # Match 'array(' 
    (      # Capture in group 1: 
    (?>     # Start a possessive group: 
     (?:     # Either match 
     (?!\barray\(|[()]) # only if we're not before another array or parens 
     .     # any character 
    )+     # once or more 
    |      # or 
     \((?<Depth>)  # match '(' (and increase the nesting counter) 
    |      # or 
     \) (?<-Depth>)  # match ')' (and decrease the nesting counter). 
    )*     # Repeat as needed. 
    (?(Depth)(?!))  # Assert that the nesting counter is at zero. 
    )      # End of capturing group. 
    \)      # Then match ')'.", 
    "[$1]", RegexOptions.IgnorePatternWhitespace | RegexOptions.Singleline); 

這個正則表達式匹配array(...)其中...可能包含除了另一個array(...)之外的任何東西(因此它只匹配最深嵌套的事件)。它確實允許...中的其他嵌套(和正確平衡的)圓括號,但是如果它們是語義圓括號,或者它們包含在字符串或註釋中,則不會執行任何檢查。

換句話說,像

array(
    'name' => 'Hugo (((Hurley', 
    'numbers' => (4 + 8 + 15 + 16 + 23 + 42)/108 
) 

將無法​​匹配(正確)。

您需要迭代應用該正則表達式,直到它不再修改其輸入 - 在您的示例中,兩次迭代就足夠了。

相關問題