2016-06-11 64 views
0

我試圖從文件中找到所有不在HTML註釋(<!-- -->)中的<meta>標籤,並使用PHP函數 - get_meta_tags獲取(提取)內容。但也有使用此功能時,兩個問題:如何從文件中找到所有不在HTML註釋(<!-- -->)中的所有<meta>標籤?

  1. 雖然<meta>標籤在註釋,如:

    <!-- 
    <meta name="title" content="Title name"> 
    <mata name="keywords" content="keyword 1, keyword 2, keyword 3"> 
    <meta name="description" content="Hello world!"> 
    <meta name="author" content="Author name"> 
    <meta name="copyright" CONTENT="All rights reserved."> 
    <meta property="og:title" content="Title name" /> 
    <meta property="og:image" content="http://www.example.com/img/logo.gif" /> 
    <meta property="og:description" content="Hello world!" /> 
    --> 
    

    ,該get_meta_tags功能仍然提取所有<meta>標籤,不管是在評論或不進入數組。但我需要的是提取HTML評論之外的<meta>標籤。也就是說,我只想要這個頁面中真正可用的<meta>標籤。

  2. 如果<meta>標籤沒有名稱,例如,有一些<meta>標籤只擁有「產權」或「HTTP的當量」,如property="og:title"http-equiv="refresh",在get_meta_tags功能將無法提取這些<meta>標籤進入陣列。

爲了解決這兩個問題,我該怎麼辦?謝謝。

回答

0

檢查了這一點:

function get_meta_tags2($url) 
{ 
$result = false; 

$contents = file_get_contents(str_replace(array('<!--','-->'), '',$url)); 

if (isset($contents) && is_string($contents)) 
{ 
    $title = null; 
    $metaTags = null; 

    preg_match('/<title>([^>]*)<\/title>/si', $contents, $match); 

    if (isset($match) && is_array($match) && count($match) > 0) 
    { 
     $title = strip_tags($match[1]); 
    } 

    preg_match_all('/<[\s]*meta[\s]*name="?' . '([^>"]*)"?[\s]*' . 'content="?([^>"]*)"?[\s]*[\/]?[\s]*>/si', $contents, $match); 

    if (isset($match) && is_array($match) && count($match) == 3) 
    { 
     $originals = $match[0]; 
     $names = $match[1]; 
     $values = $match[2]; 

     if (count($originals) == count($names) && count($names) == count($values)) 
     { 
      $metaTags = array(); 

      for ($i=0, $limiti=count($names); $i < $limiti; $i++) 
      { 
       $metaTags[$names[$i]] = array (
        'html' => htmlentities($originals[$i]), 
        'value' => $values[$i] 
       ); 
      } 
     } 
    } 

    $result = array (
     'title' => $title, 
     'metaTags' => $metaTags 
    ); 
} 

return $result; 
} 

輸出將是:

<?php 
Array 
(
[title] => Teleit.pl - strony internetowe 
[metaTags] => Array 
    (
     [description] => Array 
      (
       [html] => <meta name="description" content="Java, PHP, and some other technological mumble jumble. Also, some real-life stuff as well." /> 
       [value] => Java, PHP, and some other technological mumble jumble. Also, some real-life stuff as well. 
      ) 

     [DC.title] => Array 
      (
       [html] => <meta name="DC.title" content="Mariano Iglesias - Weblog" /> 
       [value] => Mariano Iglesias - Weblog 
      ) 

     [ICBM] => Array 
      (
       [html] => <meta name="ICBM" content="-34.6017, -58.3956" /> 
       [value] => -34.6017, -58.3956 
      ) 

     [geo.position] => Array 
      (
       [html] => <meta name="geo.position" content="-34.6017;-58.3956" /> 
       [value] => -34.6017;-58.3956 
      ) 

     [geo.region] => Array 
      (
       [html] => <meta name="geo.region" content="AR-BA"> 
       [value] => AR-BA 
      ) 

     [geo.placename] => Array 
      (
       [html] => <meta name="geo.placename" content="Buenos Aires"> 
       [value] => Buenos Aires 
      ) 

    ) 

) 
?> 

學分原始版本:馬里亞諾在cricava點com,我改變這一點給你。

相關問題