美好的一天,DOMXPath->評估沒有找到我需要的div
我想刮結果,並已成功,但我現在卡住了。
下面的代碼顯示有一個'vsc'類的DIV,裏面是一個'r'類的H3。我可以通過(// h3 [@ class ='r'// a)獲得H3標籤內的錨點。
我的問題是,下面的表也有一個'r'類的H3,我不希望表中的任何鏈接。
<li class="g">
<div class="vsc" pved="0CD4QkgowAA" bved="0CD8QkQo" sig="m15">
<h3 class="r">
<a href="https://ameriloan.com/" class="l" onmousedown="return rwt(this,'','','','1','AFQjCNEazKuyTuAyYgnAT3MqI3aJoiAlZw','','0CDwQFjAA',null,event)">
</h3>
<div class="vspib" aria-label="Result details" role="button" tabindex="0">
<div class="s">
</div>
<table cellpadding="0" cellspacing="0" class="nrgt">
這裏是我使用刮所有錨的劇本,但它不工作,只檢索H3停泊在「VSC」 DIV:
function getURL($url)
{
$ch=curl_init();
// This allows the script to accept HTTPS certificates "blindly"
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt($ch,CURLOPT_HTTP_VERSION,'CURL_HTTP_VERSION_1_1');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // Follows redirects
curl_setopt($ch, CURLOPT_MAXREDIRS, 6); // follows up to 6 redirects
$ret = curl_exec($ch);
return $ret;
}
$i = 0;
$rawKeyword = 'EXAMPLE';
$keyword = str_replace(' ', '+', $rawKeyword);
$url = "http://www.google.com/search?sourceid=chrome&ie=UTF-8&q=".$keyword;
//get the HTML through cURL function
$html = getURL($url);
// parse the html into a DOMDocument
$dom = new DOMDocument();
@$dom->loadHTML($html);
// grab all data
$xpath = new DOMXPath($dom);
// XPath eval to get page links and titles
//$elementContent = $xpath->evaluate("//h3[@class='r']//a");
$elementContent = $xpath->evaluate("//div[@class='vsc']//h3[@class='r']//a");
// Print results
foreach ($elementContent as $content) {
$i++;
$clean = trim($content->getAttribute('href'), "/url?q=");
echo '<strong>'.$i.'</strong>: <h3 style=" clear:none !important; font-size:10px; letter-spacing:0.1em; line-height:2.6em; text-transform:uppercase;">'.$content->textContent.'</h3><br/>'.$clean.'<br /><br />';
}
我在做什麼我的評估查詢錯誤?
@jdwilemo - 你是正確的方式我試圖獲得只有在一個'vsc'類的DIV錨。這裏是更多的表格代碼,它顯示了其他H3類型的'r'也...
<table cellpadding="0" cellspacing="0" class="nrgt">
<tbody>
<tr class="mslg">
<td style="vertical-align: top; ">
<div class="sld vsc" pved="0CIYBEJIKMAE" bved="0CIcBEJEK" sig="Q_U">
<span class="tl">
<h3 class="r">
<a href="https://example.com/?page=ent_cs_login" class="l" onmousedown="return rwt(this,'','','','2','AFQjCNEyANjoolNXGFnLVKH3S1j4CO1qQw','','0CIQBEIwQMAE',null,event)">
</h3>
</span>
<div class="vspib" aria-label="Result details" role="button" tabindex="0">
<div class="s">
</div>
</li>
一切都包裹在一個'li'標籤。該表是'li'標籤中的最後一個元素。我想要在'li'元素末尾的表格中獲取錨點,而不需要錨點的錨點。我希望我清除了...