2013-10-06 59 views
11

我正在使用curl(通過PHP)來抓取一個網站,我想要的是一些默認情況下只顯示前幾個的產品列表。剩下的部分會在用戶點擊按鈕時傳遞給用戶以獲取產品的完整列表,這會觸發ajax調用來返回該列表。模仿curl的ajax調用PHP

這是一言以蔽之的JS,他們使用:

headers['__RequestVerificationToken'] = token; 
$.ajax({ 
type: "post", 
url: "/ajax/getProductList", 
dataType: 'html', 
data: JSON.stringify({ historyPageIndex: 1, displayPeriod: 0, productsType: All }), 
contentType: 'application/json; charset=utf-8', 
success: function (result) { 
    $(target).html(""); 
    $(target).html(result); 
}, 
beforeSend: function (XMLHttpRequest) { 
    if (headers['__RequestVerificationToken']) { 
     XMLHttpRequest.setRequestHeader("__RequestVerificationToken", headers['__RequestVerificationToken']); 
    } 
} 
}); 

這裏是我的PHP腳本:

curl_setopt($ch, CURLOPT_USERAGENT, $userAgent); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); 
curl_setopt($ch, CURLOPT_MAXREDIRS, 10); 
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookieLocation); 
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookieLocation); 
curl_setopt($ch, CURLOPT_POST, false); 
curl_setopt($ch, CURLOPT_URL, 'https://www.domain.com/Applications/ViewProducts'); 
curl_setopt($ch, CURLOPT_REFERER, 'https://www.domain.com/'); 
$webpage = curl_exec($ch); 
$productsType = trim(find_by_pattren($webpage, '<input id="productsType" name="productsType" type="hidden" value="(.*?)"')); 
$token = trim(find_by_pattren($webpage, '<input name="__RequestVerificationToken" type="hidden" value="(.*?)"')); 

$postVariables = 'productsType='.$productsType. 
'&historyPageIndex=1 
&displayPeriod=0 
&__RequestVerificationToken='.$token; 
curl_setopt($ch, CURLOPT_POST, true); 
curl_setopt($ch, CURLOPT_POSTFIELDS, $postVariables); 
curl_setopt($ch, CURLOPT_URL, 'https://www.domain.com/ajax/getProductList'); 
curl_setopt($ch, CURLOPT_REFERER, 'https://www.domain.com/Applications/ViewProducts'); 
$webpage = curl_exec($ch); 

這將產生一個錯誤頁面的網站。我認爲主要的原因可能是:

  • 他們檢查它是否是一個Ajax請求(不知道如何解決這個問題)

  • 令牌需要在頭部,而不是在後變量

任何想法?

編輯:這裏是工作的代碼:

curl_setopt($ch, CURLOPT_USERAGENT, $userAgent); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); 
curl_setopt($ch, CURLOPT_MAXREDIRS, 10); 
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookieLocation); 
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookieLocation); 
curl_setopt($ch, CURLOPT_URL, 'https://www.domain.com/Applications/ViewProducts'); 
curl_setopt($ch, CURLOPT_REFERER, 'https://www.domain.com/'); 
$webpage = curl_exec($ch); 
$productsType = trim(find_by_pattren($webpage, '<input id="productsType" name="productsType" type="hidden" value="(.*?)"')); 
$token = trim(find_by_pattren($webpage, '<input name="__RequestVerificationToken" type="hidden" value="(.*?)"')); 

$postVariables = json_encode(array('productsType' => $productsType, 
'historyPageIndex' => 1, 
'displayPeriod' => 0)); 
curl_setopt($ch, CURLOPT_POST, true); 
curl_setopt($ch, CURLOPT_HTTPHEADER, array("X-Requested-With: XMLHttpRequest", "Content-Type: application/json; charset=utf-8", "__RequestVerificationToken: $token")); 
curl_setopt($ch, CURLOPT_POSTFIELDS, $postVariables); 
curl_setopt($ch, CURLOPT_URL, 'https://www.domain.com/ajax/getProductList'); 
curl_setopt($ch, CURLOPT_REFERER, 'https://www.domain.com/Applications/ViewProducts'); 
$webpage = curl_exec($ch); 

回答

11

要設置請求驗證令牌作爲報頭,更接近地模擬一個AJAX請求,並且內容類型設置爲JSON,使用CURLOPT_HEADER。

curl_setopt($ch, CURLOPT_HTTPHEADER, array("X-Requested-With: XMLHttpRequest", "Content-Type: application/json; charset=utf-8", "__RequestVerificationToken: $token")); 

我也注意到你設置多餘地向CURLOPT_POST虛假您的代碼的第7行,那你要發送的數據後不JSON格式。你應該有:

$postVariables = '{"historyPageIndex":1,"displayPeriod":0,"productsType":"All"}'; 
+0

謝謝 - 它的工作有輕微的改變(使用$ postVariables json_encode作爲數組,作爲你的建議仍然引起了一些錯誤) – Davor

+0

@Davor:你能告訴我們最終的代碼,包括改變你做了嗎? – pablofiumara

+1

@pablofiumara我編輯了我的問題以添加最終的工作代碼 – Davor