正向
我建議使用HTML解析模塊,因爲HTML可能會導致一些瘋狂的邊緣情況那會嚴重影響你的數據。但是如果你控制了源文本並且仍然需要/使用正則表達式,我提供了這種可能的解決方案。
說明
給出下面的文本
Example of an item in item list:
<p>
<b>1628/ SomeBoldedTitle
</b>
Some Description.
Some price 20,00kuna.
<strong>Contact somenumber
098/1234-567 some mail
</strong>
</p>
這個表達式
<p>(?:(?!<p>).)*<b>([0-9]+)/\s*((?:(?!</b>).)*?)\s*</b>\s*((?:(?!<strong>|<b>).)*?)\s*<(?:strong|b)>\s*((?:(?!</).)*?)\s*</
將解析您的文字到下面的捕捉組:
- 0組將是最的嚴格克
- 組1將多位數代碼
- 組2將標題
- 組3將描述
- 4組將是電話號碼
捕捉組
[0][0] = <p>
<b>1628/ SomeBoldedTitle
</b>
Some Description.
Some price 20,00kuna.
<strong>Contact somenumber
098/1234-567 some mail
</
[0][1] = 1628
[0][2] = SomeBoldedTitle
[0][3] = Some Description.
Some price 20,00kuna.
[0][4] = Contact somenumber
098/1234-567 some mail
解釋
注意:右鍵單擊圖像並選擇在新窗口中查看。
NODE EXPLANATION
----------------------------------------------------------------------
<p> '<p>'
----------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
<p> '<p>'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
. any character
----------------------------------------------------------------------
)* end of grouping
----------------------------------------------------------------------
<b> '<b>'
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
[0-9]+ any character of: '0' to '9' (1 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
/ '/'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the least amount
possible)):
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
</b> '</b>'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
. any character
----------------------------------------------------------------------
)*? end of grouping
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
</b> '</b>'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
( group and capture to \3:
----------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the least amount
possible)):
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
<strong> '<strong>'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
<b> '<b>'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
. any character
----------------------------------------------------------------------
)*? end of grouping
----------------------------------------------------------------------
) end of \3
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
< '<'
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
strong 'strong'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
b 'b'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
> '>'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
( group and capture to \4:
----------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the least amount
possible)):
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
</ '</'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
. any character
----------------------------------------------------------------------
)*? end of grouping
----------------------------------------------------------------------
) end of \4
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
</ '</'
「我一直在試圖說服一些東西......」你一直在嘗試什麼? – ClasG
我用我正在嘗試的代碼編輯問題 –