2014-10-01 58 views
1

我正在尋找提取完整的div,我已經能夠從源代碼的其餘部分中提取。從那個div中,我想要所有的html內容,但沒有內部的一些子div。 HTML代碼查詢:如何獲取div內的特定元素?

<div class="content"> 
    <div class="article-title"> 
     <h2>Title of the test</h2> 
     <a href="http://www.helloworld.com" title="post by world" rel="author" class="article-icon"><span class="text-icon">&#x1F464;</span>world</a> 
     <span class="article-icon"> 
      <span class="text-icon">&#x1F4C1;</span> 
       <a href="http://www.helloworld.com/world">world</a>, 
      </span> 
      <span class="article-icon"><span class="text-icon">&#x1F554;</span>20.August 2014 
     </span> 
    </div> 
    <p class="p1"> 
     <span class="s1"><b>a test</b></span> 
    </p> 
    <p class="p2"> 
     <span class="s1">text2</span> 
    </p> 
    <p class="p1"> 
     <span class="s1"><b><a href="http://www.helloworld.com/hello.jpg"> 
      <img class="alignright size-medium wp-image-19472" src="http://www.helloworld.com/hello.jpg" alt="hello" width="300" height="218"></a>Hello</b> 
     </span> 
    </p> 
    <p class="p1"> 
     <span class="s1"><b>text text text</b></span> 
    </p> 
    <p class="p1"> 
     <span class="s1"><b><a href="http://www.helloworld.com/hello2.jpg"> 
      <img class="alignleft size-medium wp-image-19474" src="http://www.helloworld.com/hello2.jpg" alt="hello2" width="300" height="200"></a>Hello2</b> 
     </span> 
    </p> 
    <p class="p1"> 
     <span class="s1">text1</span> 
    </p> 
    <p class="p1"> 
     <span class="s1">text2</span> 
    </p> 
    <p class="p1"> 
     <span class="s1"><b>Final thoughts</b></span> 
    </p> 
    <p class="p1"> 
     <span class="s1">testing (<a href="http://www.helloworld.com/test"> 
      <span class="s2">test</span></a>, 
      <a href="http://www.helloworld.com/test2"> 
      <span class="s2">test2</span></a> 
     </span> 
    </p> 
    <p class="p1"> 
     <span class="s1">***</span> 
    </p> 
    <p class="p5"><em> 
     <span class="s1">xyz <a href="http://www.helloworld.com/xyz"> 
      <span class="s2">123</span></a> (at <a href="http://www.helloworld.com"> 
      <span class="s2">http://www.helloworld.com</span></a>. &#xA0; 
     </span></em> 
    </p> 
    <div class="panel-breaking-line"></div> 
    <div class="article-tags"> <b>Tags added to this article</b> 
     <div class="tagcloud"> <a href="http://www.helloworld.com/world">world</a><a href="http://www.helloworld.com/xyz">zyx</a> </div> 
    </div> 
    <div class="panel-breaking-line"></div> 
    <div class="article-socials"> <b>Share this article with friends</b> 
     <div class="social-likes"> 
      <div class="soc-button soc-button-facebook"> <a href="http://www.facebook.com/sharer/sharer.php?u=http://www.helloworld.com/world" data-url="http://www.helloworld.com/world" class="soc-click ot-share"> 
       <span class="text-icon">&#xF30C;</span>FACEBOOK</a> 
       <span class="likes-count"> 
        <span class="count">0</span> 
        <span class="bullet">&#xA0;</span> 
       </span> 
       </div> 
       <div class="soc-button soc-button-twitter"> <a href="#" class="soc-click ot-tweet" data-hashtags="" data-url="http://www.helloworld.com/world" data-via="" data-text="World"> 
        <span class="text-icon">&#xF309;</span>TWITTER</a> 
        <span class="likes-count"> 
         <span class="count">0</span> 
         <span class="bullet">&#xA0;</span> 
        </span> 
       </div> 
       <div class="soc-button soc-button-pinterest"> <a href="http://pinterest.com/pin/create/button/?url=http://www.helloworld.com/world" data-url="http://www.helloworld.com/world" class="ot-pin soc-click"> 
       <span class="text-icon">&#xF312;</span>PINTEREST</a> 
       <span class="likes-count"> 
        <span class="count">0</span> 
        <span class="bullet">&#xA0;</span> 
       </span> 
      </div> 
      <div class="soc-button soc-button-google"> <a href="https://plus.google.com/share?url=http://www.helloworld.com/world" class="ot-pluss soc-click"> 
       <span class="text-icon">&#xF30F;</span>GOOGLE+</a> 
       <span class="likes-count"> 
        <span class="count">0</span> 
        <span class="bullet">&#xA0;</span> 
       </span> 
      </div> 
     </div> 
    </div> 
</div> 

所以basiccaly,我希望所有的內容類的HTML,但不具有類=「文章標題」,類元素=「文章,社交」和class =「文章標籤都有效」

所以它會得到剝離下來:

<div class="content"> 
    <p class="p1"> 
     <span class="s1"><b>a test</b></span> 
    </p> 
    <p class="p2"> 
     <span class="s1">text2</span> 
    </p> 
    <p class="p1"> 
     <span class="s1"><b><a href="http://www.helloworld.com/hello.jpg"> 
      <img class="alignright size-medium wp-image-19472" src="http://www.helloworld.com/hello.jpg" alt="hello" width="300" height="218"></a>Hello</b> 
     </span> 
    </p> 
    <p class="p1"> 
     <span class="s1"><b>text text text</b></span> 
    </p> 
    <p class="p1"> 
     <span class="s1"><b><a href="http://www.helloworld.com/hello2.jpg"> 
      <img class="alignleft size-medium wp-image-19474" src="http://www.helloworld.com/hello2.jpg" alt="hello2" width="300" height="200"></a>Hello2</b> 
     </span> 
    </p> 
    <p class="p1"> 
     <span class="s1">text1</span> 
    </p> 
    <p class="p1"> 
     <span class="s1">text2</span> 
    </p> 
    <p class="p1"> 
     <span class="s1"><b>Final thoughts</b></span> 
    </p> 
    <p class="p1"> 
     <span class="s1">testing (<a href="http://www.helloworld.com/test"> 
      <span class="s2">test</span></a>, 
      <a href="http://www.helloworld.com/test2"> 
      <span class="s2">test2</span></a> 
     </span> 
    </p> 
    <p class="p1"> 
     <span class="s1">***</span> 
    </p> 
    <p class="p5"><em> 
     <span class="s1">xyz <a href="http://www.helloworld.com/xyz"> 
      <span class="s2">123</span></a> (at <a href="http://www.helloworld.com"> 
      <span class="s2">http://www.helloworld.com</span></a>. &#xA0; 
     </span></em> 
    </p> 
    <div class="panel-breaking-line"></div> 
    <div class="panel-breaking-line"></div> 
</div> 

帶或不帶內容的div定義...

我嘗試了很多表達,我來了到這一點:

//This is working but returning all content of the div 

    $xpath = new DOMXPath($doc); 
    $elements = @$xpath->query("."); 
    foreach ($elements as $element) 
     $results .= $element->ownerDocument->saveHTML($element); 
    } 
這個表達式,而不只是點

然後:

div[@class='content']/*[not(contains(concat(' ', @class, ' '), 'article-title')) and not(contains(concat(' ', @class, ' '), 'article-social')) and not(contains(concat(' ', @class, ' '), 'article-tags'))] 

不退還我任何東西,任何想法我怎麼能得到這個東西的工作?

+0

你只需要添加領先的''//://'DIV [@類= '內容']/* [不包含(concat('',@class,''),'article-title'))而不是(包含(concat('',@class,''),'article-social'))而不是包含(concat('',@class,''),'article-tags'))]' – har07 2014-10-01 04:34:06

回答

0

你可以只明確地把它們放在not(contains())

$dom = new DOMDocument(); 
$dom->formatOutput = true; 
$dom->loadHTML($markup); 

$xpath = new DOMXpath($dom); 

$elements = $xpath->query(' 
//div[@class="content"]/*[ 
    not(contains(@class, "article-title")) and 
    not(contains(@class, "article-socials")) and 
    not(contains(@class, "article-tags")) 
] 
'); 

$html = ''; 
foreach ($elements as $child) { 
    $html .= $dom->saveXML($child); 
} 

echo htmlentities($html); 

Output

+0

工作除了某些原因我不得不刪除htmlentities函數....不知道爲什麼! – TheGreatOne 2014-10-04 04:34:28

+0

@ TheGreatOne im很高興這有幫助 – Ghost 2014-10-04 04:36:06