正則表達式使用的preg_match

更換網頁的meta描述撇我有這樣的數據：正則表達式使用的preg_match

<meta name="description" content="Access Kenya is Kenya's leading corporate Internet service provider and is a technology solutions provider in Kenya with IT and network solutions for your business.Welcome to the Yellow Network.Kenya's leading Corporate and Residential ISP" />;

我使用這個正則表達式：

<meta +name *=[\"']?description[\"']? *content=[\"']?([^<>'\"]+)[\"']?

爲了讓網頁說明所有工作正常，但一切都攤處處有是一個撇號。

我該如何逃避？

來源

2016-04-09 philip wanekeya

...如果該屬性值由單引號包裹，你將有相同匹配雙引號的問題，對嗎？看看[這個答案]（http://stackoverflow.com/a/1732454/3294262） – fusion3k

此外，你認爲沒有引號的元（不可能）選項。 [看看在這種情況下會發生什麼]（https://regex101.com/r/hQ1gB0/1）。 – fusion3k

@ fusion3k我有一個備用計劃，無論如何，謝謝你 –

你的正則表達式考慮了<meta>節點這三個選項：

<meta name="description" content="Some Content" /> 
<meta name='description' content='Some Content' /> 
<meta name=description content=Some Content />

第三個選項是無效的HTML，但所有可能發生，所以......你是對的。

最簡單的方法是修改原來的正則表達式的結束標記和使用?不貪婪的運營商：

<meta +name *=[\"']?description[\"']? *content=[\"']?(.*?)[\"']? */?> 
                 └─┘  └───┘ 
      search zero-or-more characters except following  closing tag characters

regex101 demo

但是 - 在這種情況下 - 發生什麼事如果你有這個元？

<meta content="Some Content" name="description" />

您的正則表達式將失敗。

要真正匹配一個HTML節點，你必須使用一個解析器：

$dom = new DOMDocument(); 
libxml_use_internal_errors(1); 
$dom->loadHTML($yourHtmlString); 
$xpath = new DOMXPath($dom); 

$description = $xpath->query('//meta[@name="description"]/@content'); 
echo $description->item(0)->nodeValue);

將輸出：

Some Content

是的，這是5線對1，但用這種方法你會匹配任何<meta name="description">（如果它包含第三個，無效的屬性）。

瞭解更多關於DOMDocument
瞭解更多關於DOMXPath
讀why you can't parse [X]HTML with regular expressions

來源

2016-04-09 17:05:17 fusion3k

正則表達式使用的preg_match

回答

相關問題