2017-02-23 115 views
0

我需要解析此網頁https://www.galliera.it/118獲取彩條下的數字。在PHP中使用curl和xpath解析HTML頁面

這是我的代碼(不工作!)...

<?php 
    ini_set('display_errors', 1); 

    $url = 'https://www.galliera.it/118'; 

    print "The url ... ".$url; 
    echo '<br>'; 
    echo '<br>'; 

    //#Set CURL parameters ... 
    $ch = curl_init(); 
    curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE); 
    curl_setopt($ch, CURLOPT_HEADER, 0); 
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
    curl_setopt($ch, CURLOPT_URL, $url); 
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE); 
    curl_setopt($ch, CURLOPT_PROXY, ''); 
    $data = curl_exec($ch); 
    curl_close($ch); 

    //print "Data ... ".$data; 
    //echo '<br>'; 
    //echo '<br>'; 

    $dom = new DOMDocument(); 
    @$dom->loadHTML($data); 

    $xpath = new DOMXPath($dom); 

    // This is the xpath for a number under a bar .... 
    // /html/body/div[2]/div[1]/div/div/ul/li[6]/span 
    // How may I get it? 
    // The following code doesn't work, it's only to show my goals .. 

    $greenWaitingNumber = $xpath->query('/html/body/div[2]/div[1]/div/div/ul/li[6]/span'); 
    $theText = (string).$greenWaitingNumber; 

    print "Data ... ".$theText; 
    echo '<br>'; 
    echo '<br>'; 

?> 

任何建議/例子/替代品?

+2

「不工作」 你能更具體?還有'(字符串)。$ greenWaitingNumber'是不好的語法,你不能像這樣回覆一個'DOMElement'(使用Simple XML時''SimpleXMLElement'可以) – Scuzzy

+0

你是對的...對不起。白頁和Web控制檯顯示「錯誤500」。我認爲問題在於... $ theText =(string)。$ greenWaitingNumber; ....線性螺母我不太確定$ xpath->查詢是否正確(請注意,我在borwser中使用「Inspect元素」交互功能獲得了xpath ... – Cesare

+2

您的x路徑對於因爲索引符號,具體的價值,但要得到他們所有你需要一些更通用的開始..'/ html/body/div/div/div/div/ul/li [6]/span' – Scuzzy

回答

1

這是您的PHP腳本,它是通過很好地排序的數組中的數據挖掘請求,您可以看到腳本的結果並根據需要更改結構。乾杯!

$html = file_get_contents("https://www.galliera.it/118"); 

$dom = new DOMDocument(); 
$dom->loadHTML($html); 
$finder = new DOMXPath($dom); 

// find all divs class row 
$rows = $finder->query("//*[contains(concat(' ', normalize-space(@class), ' '), ' row ')]"); 

$data = array(); 
foreach ($rows as $row) { 
    $groupName = $row->getElementsByTagName('h2')->item(0)->textContent; 
    $data[$groupName] = array(); 

    // find all div class box 
    $boxes = $finder->query("./*[contains(concat(' ', normalize-space(@class), ' '), ' box ')]", $row); 
    foreach ($boxes as $box) { 
     $subgroupName = $box->getElementsByTagName('h3')->item(0)->textContent; 
     $data[$groupName][$subgroupName] = array(); 

     $listItems = $box->getElementsByTagName('li'); 
     foreach ($listItems as $k => $li) { 

      $class = $li->getAttribute('class'); 
      $text = $li->textContent; 

      if (!strlen(trim($text))) { 
       // this should be the graph bar so kip it 
       continue; 
      } 

      // I see only integer numbers so I cast to int, otherwise you can change the type or event not cast it 
      $data[$groupName][$subgroupName][] = array('type' => $class, 'value' => (int) $text); 
     } 
    } 
} 

echo '<pre>' . print_r($data, true) . '</pre>'; 

和輸出是一樣的東西:

Array 
(
    [SAN MARTINO - 15:30] => Array 
     (
      [ATTESA: 22] => Array 
       (
        [0] => Array 
         (
          [type] => rosso 
          [value] => 1 
         ) 

        [1] => Array 
         (
          [type] => giallo 
          [value] => 12 
         ) 

        [2] => Array 
         (
          [type] => verde 
          [value] => 7 
         ) 

        [3] => Array 
         (
          [type] => bianco 
          [value] => 2 
         ) 

       ) 

      [VISITA: 45] => Array 
       (
        [0] => Array 
         (
          [type] => rosso 
          [value] => 5 
         ) 
... 
2

這可能有助於簡化此特定實例的xpath語句。

這將找到所有具有匹配「verde」類屬性的li元素,其下有一個span元素。

//符號表示「比賽在文檔中的任何級別」,所以你不必從根構建查詢

/* @var $node DOMElement */ 
$greenWaitingNumber = $xpath->query('//li[@class="verde"]/span'); 
foreach($greenWaitingNumber as $node) 
{ 
    echo $node->nodeValue; 
} 

*注意:這會不處理class="verde foo bar"


如果您只對一個特定值感興趣...

$greenWaitingNumber = $xpath->query('/html/body/div[2]/div[1]/div/div/ul/li[6]/spa‌​n'); 
$theText = $greenWaitingNumber[0]->nodeValue; 

這將打印「2」