2017-04-22 73 views
0

我試圖提取路段第一次出現的啓動這樣PHP提取鏈接的第一次出現在源代碼

https://encrypted-tbn3.gstatic.com/images?... 

從頁面的源代碼。該鏈接的開始和結尾的」這是我到目前爲止已經有:。

$search_query = $array[0]['Name']; 
$search_query = urlencode($search_query); 
$context = stream_context_create(array('http' => array('header' => 'User-Agent: Mozilla compatible'))); 
$response = file_get_contents("https://www.google.com/search?q=$search_query&tbm=isch", false, $context); 
$html = str_get_html($response); 
$url = explode('"',strstr($html, 'https://encrypted-tbn3.gstatic.com/images?'[0])) 

然而$網址的輸出是不是我嘗試提取鏈接,但非常不同的東西我已經加入了圖像。enter image description here

誰能解釋輸出給我,我怎麼會得到所需的鏈接?謝謝

回答

1

看來你使用PHP Simple HTML DOM Parser
我通常使用DOMDocument,這是php構建的一部分-in類。
這裏有你所需要的工作示例:

$search_query = $array[0]['Name']; 
$search_query = urlencode($search_query); 
$context = stream_context_create(array('http' => array('header' => 'User-Agent: Mozilla compatible'))); 
$response = file_get_contents("https://www.google.com/search?q=$search_query&tbm=isch", false, $context); 

libxml_use_internal_errors(true); 
$dom = new DOMDocument(); 
$dom->loadHTML($response); 

foreach ($dom->getElementsByTagName('img') as $item) { 
    $img_src = $item->getAttribute('src'); 
    if (strpos($img_src, 'https://encrypted') !== false) { 
     print $img_src."\n"; 
    } 
} 

輸出:

https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcSumjp6e37O_86nc36mlktuWpbFuCI4nkkkocoBCYW3qCOicqdu_KEK-MY 
https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcR_ttK8NlBgui_JndBj349UxZx0kHn0Z-Essswci-_5UQCmUOruY1PNl3M 
https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcSydaTpSDw2mvU2JRBGEYUOstTUl4R1VhRevv1Sdinf0fxRvU26l3pTuqo 
... 
0
$url_beginning = 'https://encrypted-tbn3.gstatic.com/images?'; 
if(preg_match('/\"(https\:\/\/encrypted\-tbn3\.gstatic\.com\/images\?.+?)\"/ui',$html, $matches)) 
    $url = $matches[1]; 
else 
    $url = ''; 

嘗試使用了preg_replace,它更適合用來解析

而且在這個例子假定你的HTML中的url應該被引用。

UPD 一點點調整版本對任何URL的開頭使用:

$url_beginning = 'https://encrypted-tbn3.gstatic.com/images?'; 
$url_beginning = preg_replace('/([^а-яА-Я[email protected]%\s])/ui', '\\\\$1', $url_beginning); 
if(preg_match('/\"('.$url_beginning.'.+?)\"/ui',$html, $matches)) 
    $url = $matches[1]; 
else 
    $url = '';