2011-05-18 48 views
1

我發現了幾個與我的問題有關的不同問題,但我無法將它們組合到一個函數中。使用PHP檢索head標籤內的多個腳本標籤的屬性和內容

這裏是我的HTML:

<head> 
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> 
<title>microscope</title> 
<script language="javascript">AC_FL_RunContent = 0;</script> 
<script src="Scripts/AC_RunActiveContent.js" language="javascript"></script> 
</head> 

下面是代碼我現在所擁有的:

$filePath = "directory/file.html"; 
retrieveScriptContentandAttributes($filePath); 

function retrieveScriptContentandAttributes($filePath) { 
$dom = new DOMDocument; 
@$dom->loadHTMLFile($filePath); 
//var_dump($dom->loadHTMLFile($filePath)); 
$head = $dom->getElementsByTagName('head')->item(0); 
$xp = new DOMXpath($dom); 
$script = $xp->query("script", $head); 

for ($row = 0; $row < 5; $row++) { 
    echo $script->item($row)->textContent; 

    if ($script->item($row) instanceof DOMNode) { 
     if ($script->item($row)->hasAttributes()) { 
      foreach ($script->item($row)->attributes as $attr) { 
       $name = $attr->nodeName; 
       $value = $attr->nodeValue; 
       $scriptAttr[] = array('attr'=>$name, 'value'=>$value); 
      } 
      echo $scriptAttr; 
     } 
    } 
} 

而且我得到的結果是「ArrayAC_FL_RunContent = 0;數組聲明:試圖獲取非對象的屬性「在行上」echo $ script-> item($ row) - > textContent;「。奇怪的是,該行執行得很好。但我需要一種方法來獲得$ scriptAttr來打印數組,如下所示:language => javascript。然後再次爲下一個腳本標記:src => Scripts/AC_RunActiveContent.js,language => javascript。

我很感謝您的幫助!

+0

所以你只是想要在數組中的每個腳本標記的所有屬性? – Yoshi 2011-05-18 15:21:01

+0

如果var_dump'$ script-> item($ row)'會發生什麼? – 2011-05-18 15:22:15

+0

是的,我想要所有的屬性和屬性的內容。 – EllaJo 2011-05-18 15:46:13

回答

1

嘗試DOMXpath(參見:http://php.net/manual/en/class.domxpath.php):

<?php 
$dom = new DOMDocument(); 
$dom->loadHtml('<head> 
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> 
<title>microscope</title> 
<script language="javascript">AC_FL_RunContent = 0;</script> 
<script src="Scripts/AC_RunActiveContent.js" language="javascript"></script> 
</head> 
'); 

$xpath = new DOMXPath($dom); 

$scriptAttributes = array(); 

/* //head/script[@src] would only select nodes with an src attribute */ 
foreach ($xpath->query('//head/script') as $node) { 
    $attributes =& $scriptAttributes[]; 
    foreach ($node->attributes as $name => $attribute) { 
     $attributes[$name] = $attribute->nodeValue; 
    } 
} 

var_dump($scriptAttributes); 

輸出

array(2) { 
    [0]=> 
    array(1) { 
    ["language"]=> 
    string(10) "javascript" 
    } 
    [1]=> 
    array(2) { 
    ["src"]=> 
    string(30) "Scripts/AC_RunActiveContent.js" 
    ["language"]=> 
    string(10) "javascript" 
    } 
} 
+0

當我將查詢更改爲// head/script時,這是非常關閉的。我將如何訪問/搜索數組的單個成員?例如,如果我只想要第二個數組的source屬性的值? – EllaJo 2011-05-18 15:59:32

+1

@EllaJo只需像訪問其他數組一樣訪問它,例如:'echo $ scriptAttributes [1] ['src']'(記住數字數組索引從0開始)。您還可以更改xpath查詢('/ head/script')來僅選擇實際具有src屬性的腳本節點。 – Yoshi 2011-05-18 16:03:31

+0

* facepalm *如此明顯。謝謝Yoshi。 – EllaJo 2011-05-18 16:04:54

1

可以清理代碼有點被消除的getElementsByTagName電話:

$dom = new DOMDocument; 
@$dom->loadHTMLFile($filePath); 
$xp = new DOMXpath($dom); 

$scripts = $xp->query("//head/script"); // find only script tags in the head block, ignoring scripts elsewhere 

foreach($scripts as $script) { 
    .... your stuff here ... 
} 

xpath查詢返回的DOMNoteList是可迭代的,因此您可以直接對其進行foreach,而無需執行counts/for循環。通過直接XPath查詢來完成此操作,您不必檢查$script節點是否爲腳本節點...這是查詢結果將返回的唯一節點類型。

+0

輸出:ArrayAC_FL_RunContent = 0;數組 更接近,謝謝! – EllaJo 2011-05-18 15:45:45