我嘗試從捲曲網站中取消某個日期。這裏是我的代碼:捲曲廢料:錯誤集曲奇餅乾
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.jstor.org/action/doBasicSearch?Query=Les+bourgeois');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_USERAGENT, random_user_agent());
$result7 = htmlspecialchars_decode(curl_exec ($ch));
curl_close($ch);
$html7 = new simple_html_dom();
$html7->load($result7);
但我有以下警告錯誤:
Warning: file_get_contents(<!DOCTYPE html> <html xmlns:mml=" http://www.w3.org/1998/Math/MathML" ; lang="en" > <head> <script type="text/javascript"> var JiffyParams = { jsStart: (new Date()).getTime()}; </script> <meta name="robots" content="noarchive,noindex,nofollow,NOODP" /> <meta name="MSSmartTagsPreventParsing" content="true"/> <title>JSTOR: An Error Occurred Setting Your User Cookie</title> <meta charset="UTF-8"/> <link rel="shortcut icon" href="/templates/jsp/favicon.ico" type="image/vnd.microsoft.icon" /> <link rel="stylesheet" type="text/css" media="screen" href="/jawrcss/N815843185/bundles/jstor.css" /> <link rel="stylesheet" type="text/css" href="//fonts.googleapis.com/css?family=Roboto:400,5 in C:\wamp\www\scrap_cairn\simple_html_dom.php on line 76
我不明白什麼是我的錯,我與捲曲初學者...也許我有從Jstor設置一些cookies,但我不知道該怎麼做。感謝您的幫助。
編輯:
我只是說這和錯誤更改:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.jstor.org/action/doBasicSearch?Query=Les+bourgeois');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_USERAGENT, random_user_agent());
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookies.txt');
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookies.txt');
$result7 = htmlspecialchars_decode(curl_exec ($ch));
curl_close($ch);
錯誤:
警告:!的file_get_contents(< DOCTYPE HTML > < - [如果IE 8 ] > < html class = " no-js lt-ie9 " lang = " en " > < [ENDIF] - > <! - [如果GT IE 8] > <! - > < HTML類= "沒有-JS " LANG = "烯" > <! - < [ENDIF] - - > <頭> <腳本類型= "文本/ JavaScript的" >(window.NREUM ||(NREUM = {}))loader_config = {Xpid中:" VwACUF9VGwsGXVRbAwA = "}; window.NREUM ||(NREUM = {} ),函數r(n){if(!e [n]){var o = e [n] = {exports:{}}; t [n] [0] .call(o.exports,function(e){var o = t [n] [1] [e]; return r(o?o:e)},o,o.exports )} return e [n] .exports} if(" function " == typeof __nr_require)return __nr_require; for(var o = 0; o < n.length; o ++)r(n [o]); return r}( {函數(t,e){函數n(t){函數e(e,n,a){t& t(e,n,a),a ||(a = {}); for (var c = s(e),f = c.length,u = i(a,o,r),d = 0; f > d; d ++)c [d] .apply(u,n); return u }函數a(t,e){f [t] = s(t).concat(e)}函數s(t){return f [t] || []}函數c(){return n(e) } var f = {};返回{on:a,emit:e,create:c,listeners:s,_events:在C:\ wamp \ www \ scrap_cairn \ simple_html_dom.php上線76
我添加一段代碼from simple_html_dom about the line 76:
function file_get_html($url, $use_include_path = false, $context=null, $offset = -1, $maxLen=-1, $lowercase = true, $forceTagsClosed=true, $target_charset = DEFAULT_TARGET_CHARSET, $stripRN=true, $defaultBRText=DEFAULT_BR_TEXT, $defaultSpanText=DEFAULT_SPAN_TEXT)
{
// We DO force the tags to be terminated.
$dom = new simple_html_dom(null, $lowercase, $forceTagsClosed, $target_charset, $stripRN, $defaultBRText, $defaultSpanText);
// For sourceforge users: uncomment the next line and comment the retreive_url_contents line 2 lines down if it is not already done.
$contents = file_get_contents($url, $use_include_path, $context, $offset);
// Paperg - use our own mechanism for getting the contents as we want to control the timeout.
//$contents = retrieve_url_contents($url);
if (empty($contents) || strlen($contents) > MAX_FILE_SIZE)
{
return false;
}
// The second parameter can force the selectors to all be lowercase.
$dom->load($contents, $lowercase, $stripRN);
return $dom;
}
中添加來自simple_hteml_dom的代碼段謝謝,它的工作原理! ;) – AlphaNico 2014-12-27 23:00:45