PHP：通過合併換行符並正確刪除空格來清理HTML

我正在使用所見即所得的編輯器，並且有一堆處理髒HTML的正則表達式。原因：我的用戶經常打的過於頻繁輸入鍵，產生許多新的冗餘線路，如：PHP：通過合併換行符並正確刪除空格來清理HTML

  ...
 
 
 
   
   
和更多的品種包括丁p， 和br

這是我嘗試目前打這樣的投入，試圖合併許多連續換行到1，使用許多不同的正則表達式：

// merge empty p tags into one 
// http://stackoverflow.com/q/16809336/1066234 
$content = preg_replace('/((<p\s*\/?>\s*)&nbsp;(<\/p\s*\/?>\s*))+/im', "<p>&nbsp;</p>\n", $content); 

// remove sceditor's: <p>\n<br>\n</p> from end of string 
// http://stackoverflow.com/questions/25269584/how-to-replace-pbr-p-from-end-of-string-that-contain-whitespaces-linebrea 
// \s* matches any number of whitespace characters (" ", \t, \n, etc) 
// (?:...)+ matches one or more (without capturing the group) 
// $ forces match to only be made at the end of the string 
$content = preg_replace("/(?:<p>\s*(<br>\s*)+\s*<\/p>\s*)+$/", "", $content); 

// remove sceditor's double: http://http:// 
$content = str_replace('http://http://', 'http://', $content); 

// remove spaces from end of string (&nbsp;) 
$content = preg_replace('/(&nbsp;)+$/', '', $content); 

// remove also <p><br></p> from end of string 
$content = preg_replace('/(<p><br><\/p>)+$/', '', $content); 

// remove line breaks from end of string - $ is end of line, +$ is end of line including \n 
// html with <p>&nbsp;</p> 
$content = preg_replace('/(<p>&nbsp;<\/p>)+$/', '', $content); 
$content = preg_replace('/(<br>)+$/', '', $content); 

// remove line breaks from beginning of string 
$content = preg_replace('/^(<p>&nbsp;<\/p>)+/', '', $content);

我尋找新的解決方案。有什麼HTML解析器，我可以告訴合併換行符和空格？或者也許有人有另一種解決這個問題的方法。

上述的正則表達式解決方案似乎不夠合適，因爲我的用戶的換行符「嘗試」的新組合滑過。

來源

2016-01-17 Kai Noack

我想在所見即所得的水平上解決這個問題。正則表達式1不需要'm'修飾符，你可能想在那裏使用's'修飾符。 – chris85

我理解你正確嗎？你想刪除每一個空行換行符？ – AMartinNo1

@ AMartinNo1是的，無論用戶在哪裏放置多個換行符，我都想將它們合併爲一個換行符。問題是換行符的'結構'是相當不可預知的，請參閱上面的示例。 –

-1

您可以使用nl2br（strip_tags（$ content））而不是上面的長代碼。

來源

2016-01-17 17:47:31

「strip_tags」的問題在於它刪除了每個「br」 ' - 標籤，但他不想刪除每個'br'標籤。此外，他還必須將幾乎所有的html標籤添加到允許的列表中，以避免不必要的標籤被移除。 – AMartinNo1

不，我有一堆其他HTML必須生存。使用strip_tags會刪除所有標籤。該解決方案是不可接受的。 –

strip_tags不會刪除所有htmls標籤，它確實允許排除某些html標籤的第二個參數。 string strip_tags（string $ str [，string $ allowable_tags]） http://php.net/manual/en/function.strip-tags.php –

我已經開發出以下代碼片段，刪除重複的br -Tags。

<?php 
$content = "<h1>Hello World</h1><p>Test\r\n<br>\r\n<br >\r\n<br >\r\n<br/>Test\r\n<br />\r\n<br /></p>"; 

echo "<code>{$content}</code><hr>\r\n\r\n\r\n\r\n"; 

$contentStripped = preg_replace('/(<br {0,}\/{0,1}>(\\r|\\n){0,}){2,}/', '<br class="reduced" />', $content); 
echo "<code>{$contentStripped}</code>\r\n\r\n\r\n\r\n";

您可能需要添加更多的測試用例。

來源

2016-01-17 18:05:23 AMartinNo1

PHP：通過合併換行符並正確刪除空格來清理HTML

回答

相關問題