2017-04-12 108 views
2

我有表結構:PHP DOM解析表陣列

<table class="c_order u_list"> 
    <thead> 
     <tr> 
     </tr> 
    </thead> 
    <tbody> 
      <tr> 
      <td> 
       11.04.2017<br/> 
       18:20   </td> 
      <td><a target="_blank" href="/personal/order/detail/457/">A-457</a></td> 
      <td>+7 (917) 119-11-42</td> 
      <td>1685.20</td> 
      <td> 
       <a target="_blank" href="/resn/i/zda_2_1/">УШКА</a><br/>с. холмский, ул. Фрунзе, д. 11<br/>3477740087   </td> 
      <td>Принят</td> 
     </tr> 
       <tr> 
      <td> 
       11.04.2017<br/> 
       16:47   </td> 
      <td><a target="_blank" href="/personal/order/detail/47565/">A-47565</a></td> 
      <td>+7 (909) 556-77-99</td> 
      <td>2574.80</td> 
      <td> 
       <a target="_blank" href="/kir/a/an_10/">ООО &quot;План&quot;</a><br/>г. Омск, ул. 10-летия Победы, д. 3;<br/>8845701069   </td> 
      <td>Доставлен</td> 
     </tr> 

      </tbody> 
</table> 

我試圖讓這個與我的PHP代碼的一個數組:

$page = curl_exec ($ch); 
curl_close ($ch); 
$dom = new DOMDocument(); 
libxml_use_internal_errors(true); 
$dom->loadHTML($page); 
libxml_clear_errors(); 
$xpath = new DOMXpath($dom); 
$data = array(); 
// get all table rows and rows which are not headers 
$table_rows = $xpath->query('//tr'); 
foreach($table_rows as $row => $tr) { 
    foreach($tr->childNodes as $td) { 
     echo $td->nodeValue; 
     $data[$row][] = preg_replace('~[\r\n]+~', '', trim($td->nodeValue)); 
    } 
    $data[$row] = array_values(array_filter($data[$row])); 
} 
print_r($data); 

,但我得到錯誤的結果(無HREF標記)的數組,但我需要類似的東西,包括所有的標籤在TD元素:

Array 
(
    [0] => Array 
    (
     [0] => 11.04.2017 18:20 
     [1] => <a target="_blank" href="/personal/order/detail/457/">A-457</a> 
     [2] => +7 (917) 119-11-42 
     [3] => 1685.20 
     [4] => <a target="_blank" href="/resn/i/zda_2_1/">УШКА</a><br/>с. холмский, ул. Фрунзе, д. 11<br/>3477740087 
     [5] => Принят 
    ) 

    [1] => Array 
    (
     [0] => 11.04.2017 16:47 
     [1] => <a target="_blank" href="/personal/order/detail/47565/">A-47565</a> 
     [2] => +7 (909) 556-77-99 
     [3] => 2574.80 
     [4] => <a target="_blank" href="/kir/a/an_10/">ООО &quot;План&quot;</a><br/>г. Омск, ул. 10-летия Победы, д. 3;<br/>8845701069 
     [5] => Доставлен 
    ) 

而如何讓名陣列關鍵指標?所以得到不是[0]而是['time']

+0

確保您以適當的編碼接收數據。如果不在原始文件中使用header('Content-type:text/plain; charset = utf-8');'。另外檢查你的PHP文件的編碼。 – lubart

+0

沒有utf8_encode解決你的問題?你的腳本文件的編碼如何? –

+0

編碼不解決結果是一樣的 –

回答

1
$table_rows = $xpath->query('//table/tbody/tr'); 
$results = array(); 
      foreach($table_rows as $row) { 
       $result = array(); 
        $expression = './td[1]'; 
         $result['Name1'] = preg_replace('~[\r\n\s]+~u', '_', trim($xpath->query($expression, $row)->item(0)->nodeValue)); 
        $expression = './td[2]'; 
         $result['Name2'] = $xpath->query($expression, $row)->item(0)->nodeValue; 
        $expression = './td[2]/a/@href'; 
         $result['NameURL'] = $xpath->query($expression, $row)->item(0)->nodeValue; 


        $expression = './td[3]'; 
         $result['Phone'] = $xpath->query($expression, $row)->item(0)->nodeValue; 
        $expression = './td[4]'; 
         $result['Price'] = $xpath->query($expression, $row)->item(0)->nodeValue; 
          $expression = './td[5]/a/@href'; 
           $result['Name10'][] = $xpath->query($expression, $row)->item(0)->nodeValue; 
          $expression = './td[5]/a'; 

        $expression = './td[6]'; 
         $result['Name11'] = $xpath->query($expression, $row)->item(0)->nodeValue; 
       array_push($results, $result);   
      } 

    print_r($results); 
2

在構造函數DOMDocument,指定編碼爲UTF-8

$dom = new DOMDocument('1.0', 'UTF-8'); 

爲了使preg_replace()功能工作安全使用UTF-8字符串,則必須使用u修改:

$data[$row][] = preg_replace('~[\r\n]+~u', '', trim($td->nodeValue));