如何從文件中找到所有不在HTML註釋（）中的所有<meta>標籤？

我試圖從文件中找到所有不在HTML註釋（）中的<meta>標籤，並使用PHP函數 - get_meta_tags獲取（提取）內容。但也有使用此功能時，兩個問題：如何從文件中找到所有不在HTML註釋（）中的所有<meta>標籤？

雖然<meta>標籤在註釋，如：

<!-- 
<meta name="title" content="Title name"> 
<mata name="keywords" content="keyword 1, keyword 2, keyword 3"> 
<meta name="description" content="Hello world!"> 
<meta name="author" content="Author name"> 
<meta name="copyright" CONTENT="All rights reserved."> 
<meta property="og:title" content="Title name" /> 
<meta property="og:image" content="http://www.example.com/img/logo.gif" /> 
<meta property="og:description" content="Hello world!" /> 
-->

，該get_meta_tags功能仍然提取所有<meta>標籤，不管是在評論或不進入數組。但我需要的是提取HTML評論之外的<meta>標籤。也就是說，我只想要這個頁面中真正可用的<meta>標籤。

如果<meta>標籤沒有名稱，例如，有一些<meta>標籤只擁有「產權」或「HTTP的當量」，如property="og:title"，http-equiv="refresh"，在get_meta_tags功能將無法提取這些<meta>標籤進入陣列。

爲了解決這兩個問題，我該怎麼辦？謝謝。

來源

2016-06-11 Banana Code

檢查了這一點：

function get_meta_tags2($url) 
{ 
$result = false; 

$contents = file_get_contents(str_replace(array('<!--','-->'), '',$url)); 

if (isset($contents) && is_string($contents)) 
{ 
    $title = null; 
    $metaTags = null; 

    preg_match('/<title>([^>]*)<\/title>/si', $contents, $match); 

    if (isset($match) && is_array($match) && count($match) > 0) 
    { 
     $title = strip_tags($match[1]); 
    } 

    preg_match_all('/<[\s]*meta[\s]*name="?' . '([^>"]*)"?[\s]*' . 'content="?([^>"]*)"?[\s]*[\/]?[\s]*>/si', $contents, $match); 

    if (isset($match) && is_array($match) && count($match) == 3) 
    { 
     $originals = $match[0]; 
     $names = $match[1]; 
     $values = $match[2]; 

     if (count($originals) == count($names) && count($names) == count($values)) 
     { 
      $metaTags = array(); 

      for ($i=0, $limiti=count($names); $i < $limiti; $i++) 
      { 
       $metaTags[$names[$i]] = array (
        'html' => htmlentities($originals[$i]), 
        'value' => $values[$i] 
       ); 
      } 
     } 
    } 

    $result = array (
     'title' => $title, 
     'metaTags' => $metaTags 
    ); 
} 

return $result; 
}

輸出將是：

<?php 
Array 
(
[title] => Teleit.pl - strony internetowe 
[metaTags] => Array 
    (
     [description] => Array 
      (
       [html] => <meta name="description" content="Java, PHP, and some other technological mumble jumble. Also, some real-life stuff as well." /> 
       [value] => Java, PHP, and some other technological mumble jumble. Also, some real-life stuff as well. 
      ) 

     [DC.title] => Array 
      (
       [html] => <meta name="DC.title" content="Mariano Iglesias - Weblog" /> 
       [value] => Mariano Iglesias - Weblog 
      ) 

     [ICBM] => Array 
      (
       [html] => <meta name="ICBM" content="-34.6017, -58.3956" /> 
       [value] => -34.6017, -58.3956 
      ) 

     [geo.position] => Array 
      (
       [html] => <meta name="geo.position" content="-34.6017;-58.3956" /> 
       [value] => -34.6017;-58.3956 
      ) 

     [geo.region] => Array 
      (
       [html] => <meta name="geo.region" content="AR-BA"> 
       [value] => AR-BA 
      ) 

     [geo.placename] => Array 
      (
       [html] => <meta name="geo.placename" content="Buenos Aires"> 
       [value] => Buenos Aires 
      ) 

    ) 

) 
?>

學分原始版本：馬里亞諾在cricava點com，我改變這一點給你。

來源

2016-06-11 17:44:46 PawelN

如何從文件中找到所有不在HTML註釋（）中的所有<meta>標籤？

回答

相關問題