2011-09-04 75 views
2

我需要從網站上的表格中刮取一些網站數據並創建將由應用程序使用的XML文檔。刮掉HTML表格數據並創建XML文檔

表看起來是這樣的:

<table id="results" class="results"> 
     <thead> 
      <tr> 
       <th scope="col" class="resRoute">Route</th> 
       <th scope="col" class="resDir">To</th> 
       <th scope="col" class="resDue sorted">Time</th> 
      </tr> 
     </thead> 
     <tbody> 
      <tr> 
       <td class="resRoute">263</td> 
       <td class="resDir">Route Name</td> 
       <td class="resDue">1 min</td> 
      </tr> 
      <tr> 
       <td class="resRoute">17</td> 
       <td class="resDir">Route Name</td> 
       <td class="resDue">2 min</td> 
      </tr> 
     </tbody> 
    </table> 

而且我想創建一個XML飼料,看起來像這樣:

<train> 
    <route>263</route> 
    <direction>Route Name</direction> 
    <due>2 Min</due> 
</train> 
<train> 
    <route>17</route> 
    <direction>Route Name</direction> 
    <due>12 Min</due> 
</train> 

回答

1

哈克hackedy砍砍砍!

 $html = '<table id="results" class="results"> 
      <thead> 
       <tr> 
        <th scope="col" class="resRoute">Route</th> 
        <th scope="col" class="resDir">To</th> 
        <th scope="col" class="resDue sorted">Time</th> 
       </tr> 
      </thead> 
      <tbody> 
       <tr> 
        <td class="resRoute">263</td> 
        <td class="resDir">Route Name</td> 
        <td class="resDue">1 min</td> 
       </tr> 
       <tr> 
        <td class="resRoute">17</td> 
        <td class="resDir">Route Name</td> 
        <td class="resDue">2 min</td> 
       </tr> 
      </tbody> 
     </table> 
    '; 

    $body = explode('<tbody>', $html); 

    $xml = simplexml_load_string("<?xml version='1.0' encoding='utf-8'?><xml />"); 

    $rows = array(); 
    foreach (array_slice(explode('<tr>', end($body)), 1) as $row) 
    { 
     preg_match('/resRoute">([0-9]+)<\/td>/', $row, $ids); 
     preg_match('/resDir">([^<]+)<\/td>/', $row, $dir); 
     preg_match('/resDue">([^<]+)<\/td>/', $row, $due); 

     $node = $xml->addChild('train'); 

     $node->addChild('route', $ids[1]); 
     $node->addChild('direction', $dir[1]); 
     $node->addChild('due', $due[1]); 
    } 

    header('Content-Type: text/xml'); 
    echo $xml->asXML(); 
2

運行它通過一個XSLT轉換:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> 
    <xsl:template match="/"> 
    <xsl:for-each select="table/tbody/tr"> 
     <train> 
     <route><xsl:value-of select="td[@class='resRoute']" /></route> 
     <direction><xsl:value-of select="td[@class='resDir']" /></direction> 
     <due><xsl:value-of select="td[@class='resDue']" /></due> 
     </train> 
    </xsl:for-each> 
    </xsl:template> 
</xsl:stylesheet>