2015-12-14 29 views
2

我寫了一個小爬蟲,我想知道如何將結果正確地分配給被調用的實例。如何在本例中正確返回類的值? PHP

我的構造函數設置了一些基本屬性,並調用下一個包含可能調用foreach循環的if循環的方法。完成所有工作後,我會回覆我的結果。

這工作得很好,但我不想回應我的json_encode數據。我寧願讓底部的$ crawler變量包含json_encode數據。

這是我的代碼:

<?php 

class Crawler { 

    private $url; 
    private $class; 
    private $regex; 
    private $htmlStack; 
    private $pageNumber = 1; 
    private $elementsArray; 

    public function __construct($url, $class, $regex=null) { 
     $this->url = $url; 
     $this->class = $class; 
     $this->regex = $regex; 

     $this->curlGet($this->url); 
    } 

    private function curlGet($url) { 
     $curl = curl_init(); 

     curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE); 
     curl_setopt($curl, CURLOPT_URL, $url); 

     $this->htmlStack .= curl_exec($curl); 

     $response = curl_getinfo($curl, CURLINFO_HTTP_CODE); 

     $this->paginate($response); 
    } 

    private function paginate($response) { 
     if($response === 200) { 
      $this->pageNumber++; 
      $url = $this->url . '?page=' . $this->pageNumber; 

      $this->curlGet($url); 
     } else { 
      $this->CreateDomDocument(); 
     } 
    } 

    private function curlGetDeep($link) { 
     $curl = curl_init(); 

     curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE); 
     curl_setopt($curl, CURLOPT_URL, $link); 

     $product = curl_exec($curl); 

     $dom = new Domdocument(); 
     @$dom->loadHTML($product); 

     $xpath = new DomXpath($dom); 

     $descriptions = $xpath->query('//div[contains(@class, "description")]'); 

     foreach($descriptions as $description) { 
      return $description->nodeValue; 
     } 
    } 

    private function CreateDomDocument() { 
     $dom = new Domdocument(); 
     @$dom->loadHTML($this->htmlStack); 

     $xpath = new DomXpath($dom); 

     $elements = $xpath->query('//article[contains(@class, "' . $this->class . '")]'); 

     foreach($elements as $element) { 
      $title = $xpath->query('descendant::div[@class="title"]', $element); 
      $title = $title->item(0)->nodeValue; 

      $link = $xpath->query('descendant::a[@class="link-overlay"]', $element); 
      $link = $link->item(0)->getAttribute('href'); 
      $link = 'https://www.gall.nl' . $link; 

      $image = $xpath->query('descendant::div[@class="image"]/node()/node()', $element); 
      $image = $image->item(1)->getAttribute('src'); 

      $description = $this->curlGetDeep($link); 

      if($this->regex) { 
       $title = preg_replace($this->regex, '', $title); 
      } 

      if(!preg_match('/\dX(\d+)?/', $title)) { 
       $this->elementsArray[] = [ 
        'title' => $title, 
        'link' => $link, 
        'image' => $image, 
        'description' => $description 
       ]; 
      }  
     } 

     echo json_encode(['beers' => $this->elementsArray]); 
    } 
} 

$crawler = new Crawler('https://www.gall.nl/shop/speciaal-bier/', 'product-block', '/\d+\,?\d*CL/i'); 

Github上鍊接了一些概述: https://github.com/stephan-v/crawler/blob/master/ArticleCrawler.php

希望有人能幫助我,因爲我有點困惑在這裏如何去得到這個工作正常。

回答

1

我太慢..男人。所以我只是延長ardabeyazoglu答案與代碼在這裏:

變化echo json_encode(['beers' => $this->elementsArray]);

$this->json = json_encode(['beers' => $this->elementsArray]);

然後

$crawler = new Crawler(....); 
var_dump($crawler->json); 

你也許可以添加一個訪問方法,而是一種公共財產的作品了。

3

你不能在構造函數中做。但是您可以將json分配給一個類屬性並以另一種方法返回。這是唯一合乎邏輯的選擇。