下面是一些修改後的代碼工程。
它首先請求登錄頁面獲取初始cookie並提取登錄表單所需的值。接下來,它執行登錄服務的帖子。然後檢查它是否嘗試使用JavaScript和元標記重定向到目標URL。
看起來你已經有了抓取表單字段的代碼,所以我沒有發佈我的,但如果你需要它,讓我知道。只要確保$formFields
是一個關聯數組,鍵是字段名稱,值是字段值。
<?php
/**
* Log in to Google account and go to account page
*
*/
$USERNAME = '[email protected]';
$PASSWORD = 'password';
$COOKIEFILE = 'cookies.txt';
// initialize curl handle used for all requests
$ch = curl_init();
// set some options on the handle
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:49.0) Gecko/20100101 Firefox/49.0");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_COOKIEJAR, $COOKIEFILE);
curl_setopt($ch, CURLOPT_COOKIEFILE, $COOKIEFILE);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 120);
curl_setopt($ch, CURLOPT_TIMEOUT, 120);
// url of our first request fetches the account login page
curl_setopt($ch, CURLOPT_URL,
'https://accounts.google.com/ServiceLogin?hl=en&service=alerts&continue=http://www.google.com/alerts/manage');
$data = curl_exec($ch);
// extract form fields from account login page
$formFields = getFormFields($data);
// inject email and password into form
$formFields['Email'] = $USERNAME;
$formFields['Passwd'] = $PASSWORD;
unset($formFields['PersistentCookie']);
$post_string = http_build_query($formFields); // build urlencoded POST string for login
// set url to login page as a POST request
curl_setopt($ch, CURLOPT_URL, 'https://accounts.google.com/ServiceLoginAuth');
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_string);
// execute login request
$result = curl_exec($ch);
// check for "Redirecting" message in title to indicate success
// based on your language - you may need to change this to match some other string
if (strpos($result, '<title>Redirecting') === false) {
die("Login failed");
var_dump($result);
}
// login likely succeeded - request account page; unset POST so we do a regular GET
curl_setopt($ch, CURLOPT_URL, 'https://myaccount.google.com/?utm_source=OGB');
curl_setopt($ch, CURLOPT_POST, 0);
curl_setopt($ch, CURLOPT_POSTFIELDS, null);
// execute request for login page using our cookies
$result = curl_exec($ch);
echo $result;
// helpef functions below
// find google "#gaia_loginform" for logging in
function getFormFields($data)
{
if (preg_match('/(<form.*?id=.?gaia_loginform.*?<\/form>)/is', $data, $matches)) {
$inputs = getInputs($matches[1]);
return $inputs;
} else {
die('didnt find login form');
}
}
// extract all <input fields from a form
function getInputs($form)
{
$inputs = array();
$elements = preg_match_all('/(<input[^>]+>)/is', $form, $matches);
if ($elements > 0) {
for($i = 0; $i < $elements; $i++) {
$el = preg_replace('/\s{2,}/', ' ', $matches[1][$i]);
if (preg_match('/name=(?:["\'])?([^"\'\s]*)/i', $el, $name)) {
$name = $name[1];
$value = '';
if (preg_match('/value=(?:["\'])?([^"\'\s]*)/i', $el, $value)) {
$value = $value[1];
}
$inputs[$name] = $value;
}
}
}
return $inputs;
}
你或許應該取使用捲曲,而不是'file_get_html'功能的URL,因爲它可能設置了一些餅乾的身份驗證服務可能會尋找形式。另外,您是否可以確認由'COOKIEJAR'指定的文件正在創建幷包含Cookie? – drew010
我檢查了COOKIEJAR文件,它裏面包含一些文本。我還將curl_init url設置爲與file_get_html相同的url,仍然是同樣的東西,對我來說沒有cookie。 :( – kazuo
我確實在這裏得到了一些頭文件嗎?它們是:HTTP/1.1 200 OK Set-Cookie:GoogleAccountsLocale_session = en;安全設置Cookie:GAPS = 1:ZuuFm50cJM2_fiqQc38hkyuCjZXRRg:bMuhAssScKIBtI1L; Path = /; Expires =星期四,23-Jan-2014 18:32:24 GMT;安全; HttpOnly內容類型:text/html; charset = UTF-8 Strict-Transport-Security:max-age = 2592000; includeSubDomains日期:2012年1月24日18:32:24 GMT截止日期:2012年1月24日18:32:24 GMT Cache-Control:private,max-age = 0 X-Content-Type-Options:nosniff X-XSS-Protection:1; mode = block內容長度:1848服務器:GSE – kazuo