2014-09-07 90 views
0

我對此很新。我想使用PHP從頁面中提取表格,並在修改所有錨點的HREF值後返回HTML。 下面是表:用DOMdocument和DOMXpath刮網頁

<html xmlns="http://www.w3.org/1999/xhtml"> 
<head> 
    <meta http-equiv="Content-Type" content="text/html; charset=windows-1255"> 
    <link rel="stylesheet" type="text/css" href="../CssGraduateE.css"> 
    <title></title> 
</head> 
<body> 
    <div> 
     <br> 
     <table class="main" cellspacing="0" cellpadding="0"> 
      <tbody> 
       <tr> 
        <td> 
         <br><span class="MainHeader">Subjects in Faculty - Electrical Engineering</span><br><br> 
         <table cellpadding="2" cellspacing="0" border="1" width="100%"> 
          <tbody> 
           <tr> 
            <td><span class="SecondHeader"> Subject Number</span></td> 
            <td><span class="SecondHeader">Subject Name</span></td> 
            <td><span class="SecondHeader">Points</span></td> 
            <td><span class="SecondHeader">Semesters</span></td> 
            <td>Subject Site</td> 
           </tr> 
           <tr> 
            <td><a href="../Subjects/?SUB=46001">46001</a>&nbsp;</td> 
            <td nowrap="">Engineering of Distributed Software Sys</td> 
            <td>3</td> 
            <td><br></td> 
            <td><a target="_newtab" href="http://www.thislinkisok.com/courses/046001">www</a></td> 
           </tr> 
           <tr> 
            <td><a href="../Subjects/?SUB=46002">46002</a>&nbsp;</td> 
            <td nowrap="">Design and Analysis of Algorithms</td> 
            <td>3</td> 
            <td>B<br></td> 
            <td>&nbsp;<br></td> 
           </tr> 
          </tbody> 
         </table> 
        </td> 
       </tr> 
      </tbody> 
     </table> 
     <br> 
     <table border="0"> 
      <tbody> 
       <tr> 
        <td>Last Update on :</td> 
        <td>Wednesday ,9 April 2014</td> 
        <td></td> 
       </tr> 
      </tbody> 
     </table> 
    </div> 
</body> 
</html>  

我知道怎麼搶我想表: $查詢= $ xpath->查詢('//表[@類= 「主」] //臺1 ]'); 但我該如何循環所有以「../xxx」開頭的鏈接並將它們修改爲如下所示的內容:「www.mynewlink.com/xxx」? 最後,我想將提取的表格作爲HTML返回。我如何使用原生DOMDocument和DOMXpath來做到這一點?

謝謝大家!

回答

1

如果$html是你的字符串與HTML您從外部網站獲得,你可以做這樣的事情:

$dom = new DOMDocument(); 
@$dom->loadHTML($html); 

$xpath = new DOMXPath($dom); 

foreach($xpath->query('//table[@class="main"]//a[starts-with(@href, "../")]') as $link) { 
    $link->setAttribute('href', preg_replace('#^..#', 'http://www.mynewlink.com', $link->getAttribute('href'))); 
} 

$container = new DOMDocument(); 
$container->appendChild($container->importNode($xpath->query('//table[@class="main"]')->item(0), true)); 

echo $container->saveHTML(); 
+0

謝謝!有效! – wpdev

+0

爲什麼我需要創建一個新的$ container DOMDocument?我不能這樣做:$ table = $ xpath-> query('// table [@ class =「main」] // table [1]');返回$ dom-> saveHTML($ table-> item(0)); – wpdev

+0

@ user3510841如果你不這樣做,你只會得到表的內部內容,沒有'

...',所以我們需要創建一個主容器,它可以容納整個表格包括其開始標籤。請點擊問題旁邊的複選標記接受我的回答,如果它解決了你的問題。 – silkfire