我有這樣的HTML內容中的p標籤:正則表達式來去除李標籤和TD標籤
<p>This is a paragraph:</p>
<ul>
<li>
<p>point 1</p>
</li>
<li>
<p>point 2</p>
<ul>
<li>
<p>point 3</p>
</li>
<li>
<p>point 4</p>
</li>
</ul>
</li>
<li>
<p>point 5</p>
</li>
</ul>
<ul>
<li>
<p><strong>sub-head : </strong>This is a para followed by heading, This is a para followed by heading, This is a para followed by heading, This is a para followed by heading</p>
</li>
<li>
<p><strong>sub-head 2: </strong></p>
<p>This is a para followed by heading, This is a para followed by heading, This is a para followed by heading, This is a para followed by heading</p>
</li>
</ul>
我想刪除所有<p> & </P >標籤之間<李> & < /李>,不管其位於<li> & </li >。同樣我需要刪除表格內的td標籤之間的p標籤。
這是到目前爲止我的控制器的代碼:
nogo={"<li>\n<p>" =>'<li>', "</p>\n</li>" => '</li>', "<td>\n<p>" => '<td>', "</p>\n</td>" => '</td>',
'<p> </p>' => '','<ul>' => "\n<ul>",'</ul>' => "</ul>\n", '</ol>' => "</ol>\n" ,
'<table>' => "\n<table width='100%' border='0' cellspacing='0' cellpadding='0' class='table table-curved'>",
'<' => '<', '>'=>'>','<br>' => '','<p></p>' => '', ' rel="nofollow"' => ''
c=params[:content]
bundle_out=Sanitize.fragment(c,Sanitize::Config.merge(Sanitize::Config::BASIC,
:elements=> Sanitize::Config::BASIC[:elements]+['table', 'tbody', 'tr', 'td', 'h1', 'h2', 'h3'],
:attributes=>{'a' => ['href']}))#.split(" ").join(" ")
re = Regexp.new(nogo.keys.map { |x| Regexp.escape(x) }.join('|'))
@bundle_out=bundle_out.gsub(re, nogo)
IM上述html內容傳遞給該代碼通過PARAMS [:內容]其中香港專業教育學院分配給一個變量c。
以下是不符合預期的o/p。一些接近p標籤和開放p標籤是李,靠近李標籤
<p>This is a paragraph:</p>
<ul>
<li>point 1</li>
<li>point 2</p>
<ul>
<li>point 3</li>
<li>point 4</li>
</ul>
</li>
<li>point 5</li>
</ul>
<ul>
<li><strong>sub-head : </strong>This is a para followed by heading, This is a para followed by heading, This is a para followed by heading, This is a para followed by heading</li>
<li><strong>sub-head 2: </strong></p>
<p>This is a para followed by heading, This is a para followed by heading, This is a para followed by heading, This is a para followed by heading</li>
</ul>
之間仍然是我的目標很簡單,我只是想刪除內裏和TD標籤的所有p標籤,其中即時通訊不能夠正確地做。任何幫助表示讚賞。
我想用正則表達式來做到這一點。我知道使用正則表達式不是解析html內容的正確方法。
使用解析器,而不是HTML。 – smathy
我建議你使用Nokogiri寶石。 – Ilya
如果你知道這不是正確的方法,爲什麼呢?我並不是說這是一種冒犯,我要求澄清 - 除非你確信解析器不是正確的解決方案,這可能是你得到的唯一答案 – alexanderbird