Tim Pietzcker給出了Dot-Net計數版本。
它與下面的PCRE(php)版本具有相同的元素。
所有的注意事項都是一樣的。特別是,非數組括號必須
應平衡,因爲它們使用與分隔符相同的右括號。
所有文本都必須解析(或應該)。
外組1,2,3,4讓你得到部分
內容
CORE-1 array()
CORE-2的任何()
例外
每場比賽讓你的這些外的事情之一併相互排斥。
訣竅是定義一個解析CORE的php 函數parse(core)
。
該函數的內部是while (regex.search(core) { .. }
循環。
每次任一CORE-1或2的基團匹配,調用函數parse(core)
傳遞
該芯的組的內容到它。
而在循環內部,只需取下內容並將其分配給散列。
明顯地,應該使用構造來替換調用(?&content)
的組1構造以獲得像變量數據那樣的散列。
在詳細的範圍內,這可能非常乏味。
通常情況下,你必須考慮每一個字符才能正確地解析整個事物。
(?is)(?:((?&content))|(?>\barray\s*\()((?=.)(?&core)|)\)|\(((?=.)(?&core)|)\)|(\barray\s*\(|[()]))(?(DEFINE)(?<core>(?>(?&content)|(?>\barray\s*\()(?:(?=.)(?&core)|)\)|\((?:(?=.)(?&core)|)\))+)(?<content>(?>(?!\barray\s*\(|[()]).)+))
的擴展性
# 1: CONTENT
# 2: CORE-1
# 3: CORE-2
# 4: EXCEPTIONS
(?is)
(?:
( # (1), Take off CONTENT
(?&content)
)
| # OR -----------------------------
(?> # Start 'array('
\b array \s* \(
)
( # (2), Take off 'array(CORE-1)'
(?= .)
(?&core)
|
)
\) # End ')'
| # OR -----------------------------
\( # Start '('
( # (3), Take off '(any CORE-2)'
(?= .)
(?&core)
|
)
\) # End ')'
| # OR -----------------------------
( # (4), Take off Unbalanced or Exceptions
\b array \s* \(
| [()]
)
)
# Subroutines
# ---------------
(?(DEFINE)
# core
(?<core>
(?>
(?&content)
|
(?> \b array \s* \()
# recurse core of array()
(?:
(?= .)
(?&core)
|
)
\)
|
\(
# recurse core of any ()
(?:
(?= .)
(?&core)
|
)
\)
)+
)
# content
(?<content>
(?>
(?!
\b array \s* \(
| [()]
)
.
)+
)
)
輸出
** Grp 0 - (pos 0 , len 11)
some_var =
** Grp 1 - (pos 0 , len 11)
some_var =
** Grp 2 - NULL
** Grp 3 - NULL
** Grp 4 [core] - NULL
** Grp 5 [content] - NULL
-----------------------
** Grp 0 - (pos 11 , len 153)
array(
'id' => nextId(),
'profile' => array(
'name' => 'Hugo Hurley',
'numbers' => (4 + 8 + 15 + 16 + 23 + 42)/108
)
)
** Grp 1 - NULL
** Grp 2 - (pos 17 , len 146)
'id' => nextId(),
'profile' => array(
'name' => 'Hugo Hurley',
'numbers' => (4 + 8 + 15 + 16 + 23 + 42)/108
)
** Grp 3 - NULL
** Grp 4 [core] - NULL
** Grp 5 [content] - NULL
-------------------------------------
** Grp 0 - (pos 164 , len 3)
;
** Grp 1 - (pos 164 , len 3)
;
** Grp 2 - NULL
** Grp 3 - NULL
** Grp 4 [core] - NULL
** Grp 5 [content] - NULL
別的東西前世,讓使用的想法
# Perl code:
#
# use strict;
# use warnings;
#
# use Data::Dumper;
#
# $/ = undef;
# my $content = <DATA>;
#
# # Set the error mode on/off here ..
# my $BailOnError = 1;
# my $IsError = 0;
#
# my $href = {};
#
# ParseCore($href, $content);
#
# #print Dumper($href);
#
# print "\n\n";
# print "\nBase======================\n";
# print $href->{content};
# print "\nFirst======================\n";
# print $href->{first}->{content};
# print "\nSecond======================\n";
# print $href->{first}->{second}->{content};
# print "\nThird======================\n";
# print $href->{first}->{second}->{third}->{content};
# print "\nFourth======================\n";
# print $href->{first}->{second}->{third}->{fourth}->{content};
# print "\nFifth======================\n";
# print $href->{first}->{second}->{third}->{fourth}->{fifth}->{content};
# print "\nSix======================\n";
# print $href->{six}->{content};
# print "\nSeven======================\n";
# print $href->{six}->{seven}->{content};
# print "\nEight======================\n";
# print $href->{six}->{seven}->{eight}->{content};
#
# exit;
#
#
# sub ParseCore
# {
# my ($aref, $core) = @_;
# my ($k, $v);
# while ($core =~ /(?is)(?:((?&content))|(?><!--block:(.*?)-->)((?&core)|)<!--endblock-->|(<!--(?:block:.*?|endblock)-->))(?(DEFINE)(?<core>(?>(?&content)|(?><!--block:.*?-->)(?:(?&core)|)<!--endblock-->)+)(?<content>(?>(?!<!--(?:block:.*?|endblock)-->).)+))/g)
# {
# if (defined $1)
# {
# # CONTENT
# $aref->{content} .= $1;
# }
# elsif (defined $2)
# {
# # CORE
# $k = $2; $v = $3;
# $aref->{$k} = {};
# # $aref->{$k}->{content} = $v;
# # $aref->{$k}->{match} = $&;
#
# my $curraref = $aref->{$k};
# my $ret = ParseCore($aref->{$k}, $v);
# if ($BailOnError && $IsError) {
# last;
# }
# if (defined $ret) {
# $curraref->{'#next'} = $ret;
# }
# }
# else
# {
# # ERRORS
# print "Unbalanced '$4' at position = ", $-[0];
# $IsError = 1;
#
# # Decide to continue here ..
# # If BailOnError is set, just unwind recursion.
# # -------------------------------------------------
# if ($BailOnError) {
# last;
# }
# }
# }
# return $k;
# }
#
# #================================================
# __DATA__
# some html content here top base
# <!--block:first-->
# <table border="1" style="color:red;">
# <tr class="lines">
# <td align="left" valign="<--valign-->">
# <b>bold</b><a href="http://www.mewsoft.com">mewsoft</a>
# <!--hello--> <--again--><!--world-->
# some html content here 1 top
# <!--block:second-->
# some html content here 2 top
# <!--block:third-->
# some html content here 3 top
# <!--block:fourth-->
# some html content here 4 top
# <!--block:fifth-->
# some html content here 5a
# some html content here 5b
# <!--endblock-->
# <!--endblock-->
# some html content here 3a
# some html content here 3b
# <!--endblock-->
# some html content here 2 bottom
# <!--endblock-->
# some html content here 1 bottom
# <!--endblock-->
# some html content here1-5 bottom base
#
# some html content here 6-8 top base
# <!--block:six-->
# some html content here 6 top
# <!--block:seven-->
# some html content here 7 top
# <!--block:eight-->
# some html content here 8a
# some html content here 8b
# <!--endblock-->
# some html content here 7 bottom
# <!--endblock-->
# some html content here 6 bottom
# <!--endblock-->
# some html content here 6-8 bottom base
#
# Output >>
#
# Base======================
# some html content here top base
#
# some html content here1-5 bottom base
#
# some html content here 6-8 top base
#
# some html content here 6-8 bottom base
#
# First======================
#
# <table border="1" style="color:red;">
# <tr class="lines">
# <td align="left" valign="<--valign-->">
# <b>bold</b><a href="http://www.mewsoft.com">mewsoft</a>
# <!--hello--> <--again--><!--world-->
# some html content here 1 top
#
# some html content here 1 bottom
#
# Second======================
#
# some html content here 2 top
#
# some html content here 2 bottom
#
# Third======================
#
# some html content here 3 top
#
# some html content here 3a
# some html content here 3b
#
# Fourth======================
#
# some html content here 4 top
#
#
# Fifth======================
#
# some html content here 5a
# some html content here 5b
#
# Six======================
#
# some html content here 6 top
#
# some html content here 6 bottom
#
# Seven======================
#
# some html content here 7 top
#
# some html content here 7 bottom
#
# Eight======================
#
# some html content here 8a
# some html content here 8b
#
這不是語言阿格諾就正則表達式而言,這是至關重要的。沒有太多的正則表達式支持你需要的遞歸。那麼,您正在使用哪個正則表達式引擎? –
不幸的是,正則表達式語法和功能不是語言不可知的。 –
通過函數調用的平衡文本可以在PCRE/Perl中完成。平衡組(計數)可以在Dot-Net中完成。那些是你的選擇。對於pcre(我認爲是Dot-Net),你只能得到整個文本的嵌套。爲了進一步細分,你必須遞歸地解析它的內容。 – sln