loadHTML（$ str）的測試內容

現在，下面的代碼對html文檔執行測試，看看是否有h1或h2標籤包含字符串$ title。代碼完美無瑕。

$s1='random text'; 
    $a1='random anchor text'; 
    $href1='http://www.someurl.com'; 
    $document = new DOMDocument(); 
    $libxml_previous_state = libxml_use_internal_errors(true); 
    $document->loadHTML($str); 
    libxml_use_internal_errors($libxml_previous_state); 

    $tags = array ('h1', 'h2'); 
    $texts = array(); 
    foreach($tags as $tag) 
    { 
     $elementList = $document->getElementsByTagName($tag); 
     foreach($elementList as $element) 
     { 
      $texts[$element->tagName] = strtolower($element->textContent); 
     } 
    } 

    if(in_array(strtolower($title),$texts)) { 
     echo '<div class="success"><i class="fa fa-check-square-o" style="color:green"></i> This article used the correct title tag.</div>'; 
    } else { 
     echo '<div class="error"><i class="fa fa-times-circle-o" style="color:red"></i> This article did not use the correct title tag.</div>'; 
    }

我需要運行三個測試，首先我需要掃描的$ s1的存在的文件，但不知道這一點。通過工作代碼，它正在尋找h1或h2標籤內的完全匹配。然而，對於$ s1我不尋找完全匹配，只是在任何文本存在的地方 - 無論是否包含其他文本。

然後，我需要另一個精確匹配測試來查找「a」文本中的$ a1，並且還需要測試href是否存在$ href1。

我不知道如何做這些測試。我相信我可以將$ a1測試看作是另一個精確匹配，但不知道如何執行href測試，也不能掃描可能被其他文本包圍的字符串。

希望這一切都有道理。

更新

我需要一個解決方案，讓我回顯一個單一「是字符串是否存在」或「不，它不需要」。類似於當前測試回聲的唯一方式，而不是每個迴路。我需要每次測試都做一次。

示例結果會是什麼樣子：

yes $s1 is in the document 
no $s1 is not in the document 
yes $href1 is an href in the document 
no $href1 is not an href in the document 
yes $a1 is an anchor text in the document 
no $a1 is not an anchor text in the document

我也相信我應該使用SUBSTR（），但我不知道究竟如何。

希望得到一些工作實例和詳細解釋。

來源

2016-11-11 Bruce

一些提示：1，你可以用'的foreach（$文檔 - >的getElementsByTagName（ '*'）爲$元素） {'選擇所有元素，然後檢查'$ element-> textContent'作爲你的字符串。 2-請查看[此鏈接]（http://www.the-art-of-web.com/php/html-xpath-query/）**第2步**，以瞭解如何尋找你的''標記... – EhsanT

不會在第一個提示中，它只會返回，如果它完全匹配。我的意思是，如果一個元素的內容是「隨機字符串和單詞，然後$ s1和更多的字符和單詞」 – Bruce

所以在這種情況下，你可以使用正則表達式來匹配你的字符串 – EhsanT

下面是從所有文本節點中提取（1）錨點href（2）錨文本（3）h1文本（4）h2文本（5）文本片段並將它們存儲在數組中的代碼。稍後，它將通過這些數組搜索相同的精確/部分匹配。

我們是用xquery做的，因爲使用它從葉節點中提取文本似乎更容易。

代碼：

<?php 
    /* returns true if an exact match for $str is found in items of $arr array */ 
    function find_exact($str, array $arr) { 
     foreach ($arr as $i) {if (!strcasecmp($i,$str)) {return(true);}} 
     return(false); 
    } 

    /* returns true if a partial/exact match for $str is found in items of $arr array */ 
    function find_within($str, array $arr) { 
     foreach ($arr as $i) {if (stripos($i,$str)!==false) {return(true);}} 
     return(false); 
    } 

    $s1='random text'; 
    $a1='random anchor text'; 
    $href1='http://www.someurl.com'; 
    $document = new DOMDocument(); 
    $libxml_previous_state = libxml_use_internal_errors(true); 

    /* Sample document. Just for testing */ 
    $str=<<<END_OF_DOC 
<h1>abc h1title def</h1> 
<h2>h2title</h2> 
<div>some random text here</div> 
<div>two</div>three 
<a href='http://www.someurl.com'>some random anchor text here</a> 
<span>four</span>five<span>six<b>boldscript</b></span> 
END_OF_DOC; 

    $document->loadHTML($str); 
    libxml_use_internal_errors($libxml_previous_state); 

    /* We extract the texts into these arrays, for matching later */ 
    $a_texts=array(); $a_hrefs=array(); $h1_texts=array(); $h2_texts=array(); $all_texts=array(); 

    /* We use XPath because it seems easier for extracting text nodes */ 
    $xp = new DOMXPath($document); $eList=$xp->query("//node()"); 
    foreach ($eList as $e) { 
     //print "Node {".$e->nodeName."} {".$e->nodeType."} {".$e->nodeValue."} {".$e->textContent."}<br/>"; 
     if (!strcasecmp($e->nodeName,"a")) { array_push($a_texts,$e->textContent);array_push($a_hrefs,$e->getAttribute("href")); } 
     if (!strcasecmp($e->nodeName,"h1")) {array_push($h1_texts,$e->textContent);} 
     if (!strcasecmp($e->nodeName,"h2")) {array_push($h2_texts,$e->textContent);} 
     if ($e->nodeType === XML_TEXT_NODE) {array_push($all_texts,$e->textContent);} 
    } 

    //var_dump($a_texts); print("<br/>"); var_dump($a_hrefs); print("<br/>"); var_dump($h1_texts); print("<br/>"); 
    //var_dump($h2_texts);print("<br/>");var_dump($all_texts);print("<br/>"); 

    if (find_within($s1,$all_texts)) { print "yes $s1 is in the document<br/>"; } 
    else { print "no $s1 is not in the document<br/>"; } 

    if (find_exact($href1,$a_hrefs)) { print "yes $href1 is an href in the document<br/>"; } 
    else { print "no $href1 is not an href in the document<br/>"; } 

    if (find_within($a1,$a_texts)) { print "yes $a1 is an anchor text in the document<br/>"; } 
    else { print "no $a1 is not an anchor text in the document<br/>"; } 
?>

結果：

yes random text is in the document 
yes http://www.someurl.com is an href in the document 
yes random anchor text is an anchor text in the document

來源

2016-11-14 01:14:55 blackpen

與此問題，是我沒有辦法回聲「找不到匹配」。例如，如果我嘗試在href中回顯「找不到匹配項」......它將在整個文檔中爲每個href回顯「找不到匹配項」 - 而不僅僅是一次。 – Bruce

@Bruce，查看修改後的代碼。我試圖將它與現有的代碼集成，將文本提取到數組中，以便我們不需要反覆查看（從而避免回聲「找不到」太多次）。 – blackpen

你這麼多我一直堅持這一個星期。必須在離開時自己創建find_exact函數，但很簡單。 :)現在它的工作完美:) – Bruce

loadHTML（$ str）的測試內容

回答

相關問題