0
我有一大塊HTML。正則表達式匹配超出預期
有了這個:
~<div>(?:.*?)<a[\s]+[^>]*?href[\s]?=[\s"\']+(#_ftnref([0-9]+))["\']+.*?>(?:[^<]+|.*?)?</a>(.*?)</div>~si
我捕捉這樣的:
<div> </div><hr align="left" size="1" width="33%" /><div><p><a title="" href="#_ftnref1">[1]</a> This is not to suggest that there are only two possible arguments to be made in support of blah blah <em>blah</em>.</p></div>
但是!我想要這個:
<div><p><a title="" href="#_ftnref1">[1]</a> This is not to suggest that there are only two possible arguments to be made in support of blah blah <em>blah</em>.</p></div>
你能幫忙嗎?
PS:(?:)
與()
相反,用於避免捕獲文本。我這樣做是有目的的,因爲我希望返回的$匹配數組對於本文中未提及的幾種不同的正則表達式是一致的。
你不介意使用'DOM'做這個? – Passerby 2013-02-21 03:41:51
http://stackoverflow.com/questions/3577641/how-to-parse-and-process-html-xml-with-php – nhahtdh 2013-02-21 03:42:50
是的,我會介意的。 DOM不合適,因爲有時標記是垃圾。 – 2013-02-21 03:42:51