2015-08-27 95 views
0

我有一個列表中的軌道的HTML文件。我想爲每個軌道創建一個PHP對象並將所有對象保存到PHP數組中。如何使用PHP通過類名從HTML獲取元素和子元素?

HTML DOM在我的test.html文件:

<ul> 
    <li class="track"> 
     <span id="primary-info"> 
      <span class="interpret">Lorem ipsum</span> 
      <span class="title">dolor sit amet</span> 
     </span> 
     <span class="secondary-info"> 
      <span class="playtime">6:00</span> 
      <span class="label">consetetur</span> 
     </span> 
    </li> 

    <li class="track"> 
     <span id="primary-info"> 
      <span class="interpret">sed diam</span> 
      <span class="title">nonumy eirmod</span> 
     </span> 
     <span class="secondary-info"> 
      <span class="playtime">7:00</span> 
      <span class="label">invidunt</span> 
     </span> 
    </li> 

</ul> 

我的PHP代碼:

<?php 

    $lTracklistArr = []; 

    // get the html 
    $HTML = file_get_contents("http://localhost/test.html"); 

    // load the dom 
    $lDoc = new DOMDocument(); 
    $lDoc->loadHTML($HTML); 

    // create XPath obj 
    $XPath = new DOMXPath($lDoc); 

    // get all tracks 
    $lTracks = $XPath->query("//*[@class='track']"); 

    $i = 0; 
    while($lTracks->item($i)) 
    { 
     // How can I get the values from the sub-elements from the DOM? 
     $lInterpret = $lTracks->item($i)-> ? 
     $lTitle = $lTracks->item($i)-> ? 
     $lPlaytime = $lTracks->item($i)-> ? 
     $lLabel = $lTracks->item($i)-> ? 

     $lTracklistArr[] = new Track($lInterpret, $lTitle, $lPlaytime, $lLabel); 

     $i++; 
    } 

    // show tracklist 
    print_r($lTracklistArr); 

    // PHP class about one track 
    Class Track 
    { 
     var $m_Interpret; 
     var $m_Title; 
     var $m_Playtime; 
     var $m_Label; 

     public function __construct($pInterpret, $pTitle, $pPlaytime, $pLabel) 
     { 
      $m_Interpret = $pInterpret; 
      $m_Title = $pTitle; 
      $m_Playtime = $pPlaytime; 
      $m_Label = $pLabel; 
     } 
    } 
?> 

這是沒有問題的,以獲得軌道。但我無法通過類名從子元素中獲取值。

注意:軌道中DOM的順序可能會改變。有必要通過類名來獲取元素。

回答

0

我可以使用XPath每磁道,當我轉換一個DOMElement每個軌道的成HTML,再轉換HTML到DOMXPath

$lTracklistArr = []; 

// get the html 
$HTML = file_get_contents("http://localhost/test.html"); 

$XPath = GetXPathByHTML($HTML); 

// get all tracks 
$lTracks = $XPath->query("//*[@class='track']"); 

$i = 0; 
while($lTracks->item($i)) 
{    
    //save DOMElement of the Track as HTML and Convert it back into DOMXPath 
    $XPathTrack = GetXPathByHTML($lTracks->item($i)->ownerDocument->saveHTML($lTracks->item($i))); 

    // How can I get the values from the sub-elements from the DOM? 
    $lInterpret = $XPathTrack->query("//*[@class='interpret']")->item(0)->nodeValue; 
    $lTitle = $XPathTrack->query("//*[@class='title']")->item(0)->nodeValue; 
    $lPlaytime = $XPathTrack->query("//*[@class='playtime']")->item(0)->nodeValue; 
    $lLabel = $XPathTrack->query("//*[@class='label']")->item(0)->nodeValue; 

    $lTracklistArr[] = new Track($lInterpret, $lTitle, $lPlaytime, $lLabel); 

    $i++; 
} 

function GetXPathByHTML($pHTML) 
{ 
    // load the dom 
    $lDoc = new DOMDocument(); 
    libxml_use_internal_errors(true); // suppress warnings 
    $lDoc->loadHTML($pHTML); 

    // create XPath obj 
    return new DOMXPath($lDoc); 
} 

這對我很有用。 A print_r($lTracklistArr)顯示結果正確:

Array ([0] => Track Object ([m_Interpret] => Lorem ipsum [m_Title] => dolor sit amet [m_Playtime] => 6:00 [m_Label] => consetetur) [1] => Track Object ([m_Interpret] => sed diam [m_Title] => nonumy eirmod [m_Playtime] => 7:00 [m_Label] => invidunt)) 
0

您可以用SimpleXML做到這一點:

<?php 


$lTracklistArr = []; 

// get the html 
$HTML = file_get_contents("http://localhost/test.html"); 


$classes = ["interpret", "title", "playtime", "label"]; 


$data = simplexml_load_string($HTML); 


foreach ($data->li as $e) { 

    $data = []; 

    $attr = (array) $e->attributes(); 

    if ( !isset($attr["@attributes"]["class"]) 
     || ("track" !== $attr["@attributes"]["class"]) 
    ) { 
     continue; 
    } 


    foreach ($e->span as $e2) { 
     foreach ($e2->span as $e3) { 
      $attr = (array) $e3->attributes(); 

      if (!isset($attr["@attributes"]["class"])) { 
       continue; 
      } 

      $class = $attr["@attributes"]["class"]; 

      if (!in_array($class, $classes)) { 
       continue; 
      } 

      $data[$class] = (string) $e3; 
     } 
    } 

    $lTracklistArr[] = new Track($data["interpret"], $data["title"], $data["playtime"], $data["label"]); 

} 


// show tracklist 
var_dump($lTracklistArr); 

// PHP class about one track 
Class Track 
{ 
    var $m_Interpret; 
    var $m_Title; 
    var $m_Playtime; 
    var $m_Label; 

    public function __construct($pInterpret, $pTitle, $pPlaytime, $pLabel) 
    { 
     $this->m_Interpret = $pInterpret; 
     $this->m_Title = $pTitle; 
     $this->m_Playtime = $pPlaytime; 
     $this->m_Label = $pLabel; 
    } 
} 
+0

非常感謝您的努力。 DOM來自HTML文檔(在我的問題中,我只發佈了基本的html代碼)。所以你不能使用SimpleXML。函數'simplexml_load_string'引發多個警告,並在循環中使用'$ data-> li'時注意'試圖獲取非對象的屬性'。 – Simon

+0

如果你不能直接訪問HTML後面的數據,你可以試試正則表達式 – mmm