使用PHP爲特定信息處理頁面

例如，我希望挖掘https://stackoverflow.com/privileges/user/3並獲取div <div class="summarycount al">6,525</div>中的數據，以便我可以將信譽與用戶編號一起添加到本地數據庫。我想我可以使用的file_get_contents使用PHP爲特定信息處理頁面

$data = file_get_contents('https://stackoverflow.com/privileges/user/3');

我如何提取即6,525所需要的數據在上面的例子？

來源

2010-10-07 abel

您需要登錄（通過PHP）才能看到相關信息。這不是非常簡單，需要一些工作。
您可以使用* shrugs * regex來解析數據，或者使用像PHP Simple HTML DOM Parser這樣的XML解析器。使用正則表達式...：
```
preg_match('!<div class="summarycount al">(.+?)</div>!', $contents, $matches); 
$rep = $matches[1]; 
```
如果您正在刮SO，您可以改用SO API。

代碼：

$url = 'http://api.stackoverflow.com/1.0/users/3'; 

$tuCurl = curl_init(); 
curl_setopt($tuCurl, CURLOPT_URL, $url); 
curl_setopt($tuCurl, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($tuCurl, CURLOPT_ENCODING, 'gzip'); 

$data = curl_exec($tuCurl); 
$parse = json_decode($data, true); 
$rep = $parse['users'][0]['reputation']; 

echo $rep;

來源

2010-10-07 16:32:49 999999

感謝嘗試。正則表達式真的很糟糕。我會通過它。該頁面不需要登錄，所以不用擔心。這是一個以SO爲例的通用問題。代碼有效！謝謝 – abel 2010-10-07 16:36:01

花費時間2.11秒。獲得10000個用戶需要5.6小時。我可以在沒有超時的情況下在一個腳本中完成整個事情嗎？ – abel 2010-10-07 16:42:32

@abel是的，你可以改變'max_execution_time'設置。我強烈建議使用SO API，或者下載[data-dump]（http://blog.stackoverflow.com/2010/10/creative-commons-data-dump-oct-10/）並從中獲取信息那裏。 – 999999 2010-10-07 16:46:05

使用PHP爲特定信息處理頁面

回答

相關問題