preg_match_all：除html標記外，在引號內部獲取文本

我最近使用了一種模式來替換雙/雙引號的雙引號。preg_match_all：除html標記外，在引號內部獲取文本

$string = preg_replace('/(\")([^\"]+)(\")/','「$2」',$string);

當$ string是句子，甚至是段落時，它工作正常。

但是......

我的函數可以調用到工作的HTML代碼塊，並且它不工作爲例外了：

$string = preg_replace('/(\")([^\"]+)(\")/','「$2」','<a href="page.html">Something "with" quotes</a>');

回報

<a href=「page.html」>Something 「with」 quotes</a>

而且這是一個問題...

所以我認爲我可以做到兩遍：提取文本w ithin標籤，然後替換引號。

我想這

$pattern='/<[^>]+>(.*)<\/[^>]+>/';

而且它的工作原理例如，如果字符串是

$string='<a href="page.html">Something "with" quotes</a>';

但它不與像字符串：

$string='Something "with" quotes <a href="page.html">Something "with" quotes</a>';

任何想法？

伯特蘭

來源

2013-09-25 Bertrand Fourrier

[小馬HE COMES]（HTTP ：//stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454） –

@Kolink我知道這會出現。這就是爲什麼我會建議使用simplexml，只將其應用於文本而不應用於屬性。 – Christoph

我必須「清理」的字符串是90％的案例中的文本字段的值，並且在某些情況下，您可以在內部使用「代碼」。這就是解析不合適的原因。 –

通常最好的回答我猜...因爲它已經被pointed out，你不應該通過正則表達式解析HTML。你可以看看PHP Simple DOM Parse來提取文本並應用你已經說過的正則表達式，它似乎工作得很好。

This教程應該把你放在正確的方向。

來源

2013-09-25 14:27:17 npinti

謝謝，但我需要解析一些代碼時使用解析器。在這種情況下，解析代碼不會幫助我替換其他人的某些字符。 –

我敢肯定，這將在火焰戰爭結束，但這個工程：

echo do_replace('<a href="page.html">Something "with" quotes</a>')."\n"; 
echo do_replace('Something "with" quotes <a href="page.html">Something "with" quotes</a>')."\n"; 

function do_replace($string){ 
    preg_match_all('/<([^"]*?|"[^"]*")*>/', $string, $matches); 
    $matches = array_flip($matches[0]); 

    $uuid = md5(mt_rand()); 
    while(strpos($string, $uuid) !== false) $uuid = md5(mt_rand()); 
    // if you want better (time) garanties you could build a prefix tree and search it for a string not in it (would be O(n) 

    foreach($matches as $key => $value) 
     $matches[$key] = $uuid.$value; 

    $string = str_replace(array_keys($matches), $matches, $string); 
    $string = preg_replace('/\"([^\"<]+)\"/','&ldquo;$1&rdquo;', $string); 
    return str_replace($matches, array_keys($matches), $string); 
}

輸出（I替換& ldquo;並且& rdquo;的與「和」）：

<a href="page.html">Something 「with」 quotes</a> 
Something 「with」 quotes <a href="page.html">Something 「with」 quotes</a>

有了一個costum狀態機，你甚至可以在沒有第一次替換的情況下完成它，而不是替換回來。無論如何，我建議使用解析器。

來源

2013-09-25 15:27:29 Christoph

我試了一下，它的工作原理。謝謝。問題是，在90％的時間內，它只是一個我得到的字符串（來自文本輸入的值），並且使用解析器來處理字符串或少數標記實際上需要更多的工作。這個正則表達式並不意味着用於完整的html頁面。 –

隨意投票和/或接受，如果它是正確的。 – Christoph

我終於找到了一個方法：

提取文本，可以是內部或外部（前，後）任何標記（如果有的話）
使用回調通過對找到的報價和替換它們。

代碼

$string = preg_replace_callback('/[^<>]*(?!([^<]+)?>)/sim', create_function('$matches', 'return preg_replace(\'/(\")([^\"]+)(\")/\', \'「$2」\', $matches[0]);'), $string);

來源

2013-09-26 09:35:51

伯特蘭，復活這個問題，因爲它有一個簡單的解決方案，可以讓你一氣呵成，無需回調替換。（發現你的問題而做一些研究的一般問題有關how to exclude patterns in regex）

下面是我們簡單的regex：

<[^>]*>(*SKIP)(*F)|"([^"]*)"

交替的左側匹配完整<tags>然後故意失敗。右側匹配雙引號字符串，並且我們知道它們是正確的字符串，因爲它們不與左側的表達式匹配。

此代碼顯示如何使用正則表達式（見結果在online demo的底部）：

<?php 
$regex = '~<[^>]*>(*SKIP)(*F)|"([^"]*)"~'; 
$subject = 'Something "with" quotes <a href="page.html">Something "with" quotes</a>'; 
$replaced = preg_replace($regex,"「$1」",$subject); 
echo $replaced."<br />\n"; 
?>

參考

How to match (or replace) a pattern except in situations s1, s2, s3...

來源

2014-05-21 06:32:22 zx81

preg_match_all：除html標記外，在引號內部獲取文本

回答

相關問題