2012-01-25 67 views
2

我目前使用以下的(舊)的代碼登錄到網站...PHP捲曲多上需要密碼的網站

public function login() { 
    $url1 = 'https://...'; /* Initial page load to collect initial session cookie data */ 
    $url2 = 'https://...'; /* The page to POST login data to */ 
    $url3 = 'https://...'; /* The page redirected to to test for success */ 
    $un = 'user'; 
    $pw = 'pass'; 

    $post_data = array(
     'authmethod' => 'on', 
     'username' => $un, 
     'password' => $pw, 
     'hrpwd'  => $pw 
    ); 

    $curlOpt1 = array(
     CURLOPT_URL   => $url1, 
     CURLOPT_COOKIEJAR  => self::COOKIEFILE, 
     CURLOPT_COOKIEFILE  => self::COOKIEFILE, 
     CURLOPT_FOLLOWLOCATION => TRUE, 
     CURLOPT_HEADER   => FALSE, 
     CURLOPT_RETURNTRANSFER => TRUE, 
     CURLOPT_SSL_VERIFYPEER => FALSE 
    ); 

    $curlOpt2 = array(
     CURLOPT_URL   => $url2, 
     CURLOPT_COOKIEJAR  => self::COOKIEFILE, 
     CURLOPT_COOKIEFILE  => self::COOKIEFILE, 
     CURLOPT_FOLLOWLOCATION => TRUE, 
     CURLOPT_POST   => TRUE, 
     CURLOPT_POSTFIELDS  => http_build_query($post_data) 
    ); 

    $this->ch = curl_init(); 
    if (!$this->ch) { 
     throw new Exception('Unable to init curl. ' . curl_error($curl)); 
    } 

    /* Load the login page once to get the session ID cookies */ 
    curl_setopt_array($this->ch, $curlOpt1); 
    if (!curl_exec($this->ch)) {    
     throw new Exception('Unable to retrieve initial auth cookie.'); 
    } 

    /* POST the login data to the login page */ 
    curl_setopt_array($this->ch, $curlOpt2); 
    if (!curl_exec($this->ch)) { 
     throw new Exception('Unable to post login data.'); 
    } 

    /* Verify the login by checking the redirected url. */ 
    $header = curl_getinfo($this->ch); 
    $retUrl = $header['url']; 

    if ($retUrl == $url3) { 
     /* Reload the login page to get the auth cookies */ 
     curl_setopt_array($this->ch, $curlOpt1); 
     if (curl_exec($this->ch)) { 
      return true; 
     } else { 
      throw new Exception('Unable to retrieve final auth cookie.'); 
     } 
    } else { 
     throw new Exception('Login validation failure.'); 
    } 

    return false; 
} 

然後我用...

public function getHtml($url) { 
    $html = FALSE; 

    try { 
     curl_setopt($this->ch, CURLOPT_URL, $url); 
     $page = curl_exec($this->ch); 
    } catch (Exception $e) { 
     ... 
    } 

    /* Remove all tabs and newlines from the HTML */ 
    $rmv = array("\n","\t"); 
    $html = str_replace($rmv, '', $page); 

    return $html; 
} 

.. 。爲每個頁面請求。我的問題是,我怎麼能轉換這個使用curl_multi_exec來使幾百個查找更快?我無法找到curl_multi WITH登錄的任何示例。我是否簡單地用curl_multi_exec替換所有curl_execs?另外,如果你看到任何其他明顯的明顯錯誤,評論肯定是受歡迎的。

爲了清楚起見,我想用單個用戶/通行證登錄,然後將這些憑證用於多個頁面請求。

+2

爲什麼這看起來很邪惡? – Michael

+0

別擔心 - 你不能用PHP做任何真正的破壞。 –

+1

因爲捲曲中的一切似乎邪惡?我們在工作中使用它來從我們的其他部門收集我們的產品數據,而無需直接訪問數據庫。我同意這是沒有道理的,但那是他們想要的。 – Isius

回答

1

已經有一段時間了,但我想發佈我的最終解決方案。我發現了一個很棒的多捲曲庫,rolling-curl,這有助於。基本上,在收集登錄cookie(顯示在我的原始問題)後,我將它和其他選項送回到每個多重請求的捲曲實例,然後執行批處理。奇蹟般有效。

public function getMultiPage(array $urls, $url_prepend=NULL, $callback) { 
    $rc = new RollingCurl(array('Att_Screen_Scraper', $callback)); 
    $rc->window_size = 15; /* number of threads to run */ 
    $rc->options = array(
     CURLOPT_COOKIEJAR  => self::COOKIEFILE, 
     CURLOPT_COOKIEFILE  => self::COOKIEFILE, 
     CURLOPT_FOLLOWLOCATION => TRUE, 
     CURLOPT_HEADER   => FALSE, 
     CURLOPT_RETURNTRANSFER => TRUE, 
     CURLOPT_SSL_VERIFYPEER => FALSE 
    ); 

    foreach ($urls as $i=>$url) { 
     $request = new RollingCurlRequest($url_prepend . $url); 
     echo $url_prepend . $url . "<br>\n"; 
     $rc->add($request); 
    } 

    if(!$rc->execute()) { 
     throw new Exception('RollingCurl execute failed'); 
    } 

    return TRUE; 
} 

請注意,此解決方案需要回調來處理每個請求的返回。 RollingCurl的文檔很好地描述了這一點,所以我不會在此重申。