2014-01-11 98 views
0

我想創建一個腳本,將登錄到我選擇的網站是IPT使用PHP登錄後從網站獲取內容

我已經設法登錄並獲取數據,但不太清楚我如何獲得我想要的數據,然後將其保存到數據庫,最終將其輸出到Wordpress。

這裏是我到目前爲止有:

<?php  

login("http://iptorrents.com/torrents/","username=userhere&password=passhere"); 
echo grab_page("http://iptorrents.com/torrents/"); 


function login($url,$data){ 
    $fp = fopen("cookie.txt", "w"); 
    fclose($fp); 
    $login = curl_init(); 
    curl_setopt($login, CURLOPT_COOKIEJAR, "cookie.txt"); 
    curl_setopt($login, CURLOPT_COOKIEFILE, "cookie.txt"); 
    curl_setopt($login, CURLOPT_TIMEOUT, 40000); 
    curl_setopt($login, CURLOPT_RETURNTRANSFER, TRUE); 
    curl_setopt($login, CURLOPT_URL, $url); 
    curl_setopt($login, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']); 
    curl_setopt($login, CURLOPT_FOLLOWLOCATION, TRUE); 
    curl_setopt($login, CURLOPT_POST, TRUE); 
    curl_setopt($login, CURLOPT_POSTFIELDS, $data); 
    ob_start(); 
    return curl_exec ($login); 
    ob_end_clean(); 
    curl_close ($login); 
    unset($login);  
}     

function grab_page($site){ 
    $ch = curl_init(); 
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); 
    curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']); 
    curl_setopt($ch, CURLOPT_TIMEOUT, 40); 
    curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt"); 
    curl_setopt($ch, CURLOPT_URL, $site); 
    ob_start(); 
    return curl_exec ($ch); 
    ob_end_clean(); 
    curl_close ($ch); 
} 

function get_data($url){ 
    $ch = curl_init(); 
    $timeout = 5; 
    curl_setopt($ch, CURLOPT_URL, $url); 
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout); 
    $data = curl_exec($ch);  
    curl_close($ch); 
    return $data; 
} 

$returned_content = get_data('http://iptorrents.com/torrents'); 
echo($returned_content); 

function post_data($site,$data){ 
    $datapost = curl_init(); 
    $headers = array("Expect:"); 
    curl_setopt($datapost, CURLOPT_URL, $site); 
    curl_setopt($datapost, CURLOPT_TIMEOUT, 40000); 
    curl_setopt($datapost, CURLOPT_HEADER, TRUE); 
    curl_setopt($datapost, CURLOPT_HTTPHEADER, $headers); 
    curl_setopt($datapost, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']); 
    curl_setopt($datapost, CURLOPT_POST, TRUE); 
    curl_setopt($datapost, CURLOPT_POSTFIELDS, $data); 
    curl_setopt($datapost, CURLOPT_COOKIEFILE, "cookie.txt"); 
    ob_start(); 
    return curl_exec ($datapost); 
    ob_end_clean(); 
    curl_close ($datapost); 
    unset($datapost);  
} 

?> 
這一切後,我得到我請求登錄頁面

但隨後也帶來了下面所有的登錄表單。

我想我可能只是不會登錄,所以也許我應該使用會話和存儲登錄?

我想從網站每隔10分鐘抓住每個ID的說,然後從這些抓取某些內容並最終輸出到我自己的格式。

任何幫助表示讚賞。

回答

0

是的,你需要堅持你的會話cookie登錄後重復請求。而不是在curl這樣做,我強烈建議使用成熟的HTTP客戶端庫。我想推薦http://guzzlephp.org

與cookies工作:http://docs.guzzlephp.org/en/latest/plugins/cookie-plugin.html

+0

感謝您的快速反應,我會看看這些,看看我能解決什麼問題。 我有幾個人,我知道已經完成了我想要做的事情,但他們不會讓他們如何做到這一點,因爲他們不希望我能夠獲得內容並將其張貼到網站上一樣簡單作爲他們。 相當煩人,因爲我就像學習這樣的東西,這是我的一個小項目。 –

0

我覺得你可以做很多simplier(你將記錄與餅乾,將需要一些options:CURLOPT_COOKIEJAR和CURLOPT_COOKIEFILE):

function curl_request($action, $postfields = array(), $ref = "") { 
    $absolute_path = realpath('./'); 
    $timeout = 20; 
    $useragent = 'Mozilla/5.0 (Windows; U; Windows NT 6.1; fr; rv:1.9.2) Gecko/20100115 Firefox/3.6'; 
    $referer = (empty($ref)) ? $action : $ref; 
    $ch = curl_init(); 
    curl_setopt($ch, CURLOPT_URL, $action); 
    if (!empty($postfields)) { 
     curl_setopt($ch, CURLOPT_POST, true); 
     curl_setopt($ch, CURLOPT_POSTFIELDS, $postfields); 
    } 
    curl_setopt($ch, CURLOPT_TIMEOUT, $timeout); 
    curl_setopt($ch, CURLOPT_USERAGENT, $useragent); 
    curl_setopt($ch, CURLOPT_REFERER, $referer); 
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); 
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); 

    // these two lines will keep the cookies to get you logged 
    curl_setopt($ch, CURLOPT_COOKIEJAR, $absolute_path."/cookie.txt"); 
    curl_setopt($ch, CURLOPT_COOKIEFILE, $absolute_path."/cookie.txt"); 

    curl_setopt($ch, CURLOPT_HEADER, 0); 
    $status = curl_getinfo($ch, CURLINFO_HTTP_CODE); 
    $contents = curl_exec($ch); 
    curl_close($ch); 

    $data = array($contents, $status); 
    return $data; 
} 

$credentials = array(
    "username" => "userhere", 
    "password" => "passhere" 
); 

curl_request("http://iptorrents.com/torrents/", $credentials); // we save the cookie 

$data = curl_request('http://iptorrents.com/torrents'); 
$returned_content = $data[0]; 
echo($returned_content); 
+0

謝謝你的建議「路易十四」,但這個建議不會登錄,當我嘗試它。我的憑據不會登錄我。 我將編輯一些代碼並嘗試讓它工作,但我似乎無法使用此方法登錄。 –

+0

也許該網站給你一個cookie,以確保你來到登錄頁面,所以你可以嘗試從http://iptorrents.com/torrents/ –

+0

做一個虛擬請求,並確保文件cookie.txt已經創建並不是空的(我也可能犯了一個錯字)。 –

0

我知道這是一個有點老了,但爲了將來可能面對這個問題的人,我在使用curl時遇到同樣的問題,在搜索並嘗試了很多代碼後,下面的代碼服務於我的目的:

<?php 
//Login page 
$loginURL = 'www.example.com/login.php'; 
//target page we want to fetch content; 
$target = 'www.example.com/target.php'; 

//Initiate first curl request 
$ch = curl_init(); 
//Create a coockie to save session state 
curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt"); 
//Set login url 
curl_setopt($ch, CURLOPT_URL, $loginURL); 
//login request is a post request 
curl_setopt($ch, CURLOPT_POST, TRUE); 
//set login page field names and their value 
curl_setopt($ch, CURLOPT_POSTFIELDS, 
array(
'login_username' => 'userName', 
'login_password' => 'password', 
'realm' => 'local', 
'action' => 'login' 
)); 

//Execute login request 
ob_start(); 
curl_exec ($ch); 
ob_end_clean(); 
curl_close ($ch); 
unset($ch); 

//Initiate second curl request 
$ch = curl_init(); 

//Return transfer and avoid printing the output 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); 

//get session state 
curl_setopt($ch, CURLOPT_COOKIEFILE,"cookie.txt"); 
//request our target page 
curl_setopt($ch, CURLOPT_URL, $targetURL); 
//The line below is optional for php 5.1 and higher 
curl_setopt($ch,CURLOPT_BINARYTRANSFER, true); 

//Execute curl request and get the content of the target page 
$content = curl_exec ($ch); 
//close the curl request 
curl_close ($ch); 

//display the content in the our page 
echo $content; 
?> 

不要忘記這段代碼只讀取html頁面,例如圖像不會顯示在輸出中。