2013-03-21 393 views
2
<div class="product_box"> 
    <div class="list_sale"> 
     <img src="link" class="listsale" alt=""> 
      <div class="product_box_title"> 
       <a href="link"><strong>title here</a></strong> 
      </div> 
      <div class="product_box_desc"> 
       some text here 
       <strike>some text</strike> 
       <br /> 
       <span class="list_price">THIS IS THE NEEDED TEXT</span> 
       <a href="link"><strong>some text</strong></a> 
      </div> 
      <div class="list_buynow"> 
      <form action="link" class="add_to_cart" method="post"> 
       <div class="add_cart"> 
        <input type="image" src="link" value="add_to_cart" class="add_button"> 
        <input id="fast_order_0_item_code" type="hidden" name="fast_order[0] [item_code]" value="value" class="item_code"/> 
        <input name="fast_order[0][add]" value="1" class="add_qty"> 
        <input type="hidden" name="redirect_uri" value="value"> 
       </div> 
      </form> 
     </div> 
     <div class="product_box_img"> 
      <a href="link"> 
       <a href="link"><img src="http://stacktoheap.com/images/stackoverflow.png" alt=""></a> 
      </a> 
     </div> 
    </div> 
</div> 

這是我的html文件,從這個div中我需要提取「這是需要的文本」。我已經能夠得到與類「product_box_desc」的div,並從中我可以得到它下面的文本「這裏的一些文字」。但是我無法獲取包含文本的跨度。這裏是我正在使用的XPATH查詢,請提出需要更改的內容。無法使用xpath獲取span標記

$dom_xpath->query("//div[@class='product_box']/div/div[@class='product_box_desc']/span[@class='list_price']") 
+0

爲什麼你沒有使用jQuery?你是否被迫使用XML解析器? – JoDev 2013-03-21 10:31:38

+0

我必須通過一個包含類似html代碼塊的完整站點解析並從那裏獲取數據以提供給csv文件。 – vikramaditya234 2013-03-21 11:09:23

回答

1

此查詢我工作得很好:

//div[@class="product_box"]/div[@class="list_sale"]/div[@class="product_box_desc"]/span[@class="list_price"] 

但我更改HTML這一個:

<div class='product_box'> 
    <div class='list_sale'> 
     <img src='link' class='listsale' alt='' /> 
     <div class='product_box_title'> 
      <a href='link'><strong>title here</strong></a> 
     </div> 
     <div class='product_box_desc'> 
      some text here 
      <strike>some text</strike> 
      <br /> 
      <span class='list_price'>THIS IS THE NEEDED TEXT</span> 
      <a href='link'><strong>some text</strong></a> 
     </div> 
     <div class='list_buynow'> 
     <form action='link' class='add_to_cart' method='post'> 
      <div class='add_cart'> 
       <input type='image' src='link' value='add_to_cart' class='add_button'> 
       <input id='fast_order_0_item_code' type='hidden' name='fast_order[0] [item_code]' value='value' class='item_code'/> 
       <input name='fast_order[0][add]' value='1' class='add_qty'> 
       <input type='hidden' name='redirect_uri' value='value'> 
      </div> 
     </form> 
     </div> 
     <div class='product_box_img'> 
      <a href='link'> 
       <img src='http://stacktoheap.com/images/stackoverflow.png' alt=''> 
      </a> 
     </div> 
    </div> 
</div> 

一個錯誤是在這裏:
<a href='link'><strong>title here</strong></a>,而不是
<a href='link'><strong>title here</a></strong>

我這樣做:

$nodes = ($xPath->query('//div[@class="product_box"]/div[@class="list_sale"]/div[@class="product_box_desc"]/span[@class="list_price"]')); 

foreach($nodes as $node) { 
    echo $node->textContent; 
} 
+0

我無法控制HTML,但是這是我的錯誤,感謝您的更改。但查詢的結果仍然是:'DOMNodeList Object([length] => 0)' – vikramaditya234 2013-03-21 14:23:53

+0

它恰好**是相同的DOM樹嗎?到最後一個逗號? – JoDev 2013-03-21 14:30:16

+0

完整的HTML文件非常大,這只是其中的一部分,不用說。我已經刪除了文字之間的多餘空格。我不認爲這應該是一個問題,但是我可能是錯的:) – vikramaditya234 2013-03-21 14:45:07