2015-07-01 321 views
1

我有我自己的外部網站,我想從網站獲取一些數據。我用CURL來獲取網站的內容,但我想要一些部分是:從網站獲取數據

編輯:非常坦率地說,我想獲取Facebook頁面的時間戳,如果您在頁面上使用Inspect元素,您將看到這樣的代碼:

<span class="fsm fwn fcg"><a class="_5pcq"> 
<abbr title="Tuesday, June 30, 2015 at 5:00pm" data-utime="1435663826" data-shorten="1" class="_5ptz timestamp livetimestamp">5 hrs</abbr></a> 
<span class="fsm fwn fcg"><a class="_5pcq"> 
<abbr title="Tuesday, June 30, 2015 at 5:01pm" data-utime="1435663827" data-shorten="1" class="_5ptz timestamp livetimestamp">5 hrs</abbr></a> 
<span class="fsm fwn fcg"><a class="_5pcq"> 
<abbr title="Tuesday, June 30, 2015 at 5:02pm" data-utime="1435663828" data-shorten="1" class="_5ptz timestamp livetimestamp">5 hrs</abbr></a> 
<span class="fsm fwn fcg"><a class="_5pcq"> 
<abbr title="Tuesday, June 30, 2015 at 5:03pm" data-utime="1435663829" data-shorten="1" class="_5ptz timestamp livetimestamp">5 hrs</abbr></a> 
<span class="fsm fwn fcg"><a class="_5pcq"> 
<abbr title="Tuesday, June 30, 2015 at 5:04pm" data-utime="1435663830" data-shorten="1" class="_5ptz timestamp livetimestamp">5 hrs</abbr></a> 
</span> 

我只是想顯示「數據UTIME」是1435663826.這裏的價值是我的代碼,將獲取的內容。在此之後我應該使用什麼?

$cookie = tmpfile(); 
    $userAgent = 'Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31' ; 

    $ch = curl_init("https://www.mywebsite.com"); 

    $options = array(
     CURLOPT_CONNECTTIMEOUT => 20 , 
     CURLOPT_USERAGENT => $userAgent, 
     CURLOPT_AUTOREFERER => true, 
     CURLOPT_FOLLOWLOCATION => true, 
     CURLOPT_RETURNTRANSFER => true, 
     CURLOPT_COOKIEFILE => $cookie, 
     CURLOPT_COOKIEJAR => $cookie , 
     CURLOPT_SSL_VERIFYPEER => 0 , 
     CURLOPT_SSL_VERIFYHOST => 0 
    ); 

    curl_setopt_array($ch, $options); 
    $kl = curl_exec($ch); 
    curl_close($ch); 

    echo $kl; // Final output after fetching 
+0

喜傑夫,你可以給整個PHP。我可以幫你解決它。 –

+0

這是完整的PHP! – Jeff

回答

1

你可以使用PHP的DOM擴展load and parse HTML文件,然後使用DOMXPath一個實例query特定元素。

+0

我嘗試了很多人。但它不適合我。僅僅因爲我使用CURL來獲取? – Jeff

0

如果你已經得到的HTML標籤,你可以

試試這個:

<?php 

$curl = curl_init('https://www.facebook.com/Rajnikant.Vs.CIDJokez'); 


curl_setopt($curl, CURLOPT_FAILONERROR, true); 
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true); 
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true); 
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false); 
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false); 
$result = curl_exec($curl); 
//echo $result; 

/* $result = 
'<span class="fsm fwn fcg"><a class="_5pcq"> 
<abbr title="Tuesday, June 30, 2015 at 5:00pm" data-utime="1435663826" data-shorten="1" class="_5ptz timestamp livetimestamp">5 hrs</abbr></a> 
<span class="fsm fwn fcg"><a class="_5pcq"> 
<abbr title="Tuesday, June 30, 2015 at 5:01pm" data-utime="1435663827" data-shorten="1" class="_5ptz timestamp livetimestamp">5 hrs</abbr></a> 
<span class="fsm fwn fcg"><a class="_5pcq"> 
<abbr title="Tuesday, June 30, 2015 at 5:02pm" data-utime="1435663828" data-shorten="1" class="_5ptz timestamp livetimestamp">5 hrs</abbr></a> 
<span class="fsm fwn fcg"><a class="_5pcq"> 
<abbr title="Tuesday, June 30, 2015 at 5:03pm" data-utime="1435663829" data-shorten="1" class="_5ptz timestamp livetimestamp">5 hrs</abbr></a> 
<span class="fsm fwn fcg"><a class="_5pcq"> 
<abbr title="Tuesday, June 30, 2015 at 5:04pm" data-utime="1435663830" data-shorten="1" class="_5ptz timestamp livetimestamp">5 hrs</abbr></a> 
</span>'; 
*/ 
$html = $result; 
$dom = new DOMDocument(); 

@$dom->loadHTML($html); 
$a = $dom->getElementsByTagName('abbr'); 

$data = array(); 

for ($i=0; $i < $a->length; $i++) { 
    $data[] = $a->item($i)->getAttribute('data-utime'); 

} 

echo '<pre>'; 
print_r($data); 
echo '</pre>'; 
+0

嗯,非常愚蠢地說,我想要報廢Facebook頁面。並想獲得第一頁的時間戳。你顯示的代碼不起作用。頁面是:https://www.facebook.com/Rajnikant.Vs.CIDJokez 如果你使用Inspect Element,那麼你可以看到帖子的時間戳 – Jeff

+0

@Jeff更新了我的答案。 –

+0

仍然沒有工作!這是爲你工作嗎?如果你使用你的代碼,那麼它會告訴你錯誤「更新你的瀏覽器」 – Jeff