獲取HTML從捲曲和帶HTML與網頁預浸替換

我想從海盜灣的統計信息，統計資料可以在下面的div被發現TPB：獲取HTML從捲曲和帶HTML與網頁預浸替換

<div id="stats">5.695.184 registered users Last updated 14:46:05.<br />35.339.741 peers (25.796.820 seeders + 9.542.921 leechers) in 4.549.473 torrents.<br /> </div>

這是我的代碼：

<?php 
    $ch = curl_init(); 
    $timeout = 5; 
    curl_setopt($ch, CURLOPT_URL,"http://thepiratebay.se"); 
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
    curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,$timeout); 
    curl_setopt($ch,CURLOPT_COOKIE,"language=nl_NL; c[thepiratebay.se][/][language]=nl_NL"); 
    $data=curl_exec($ch); 
    $data = preg_replace('/(.*?)(<div id="stats">)(.*?)(<\/div>)(.*?)/','$2',$data); 
    echo $data; 
    curl_close($ch); 
    exit; 
?>

正如你可以看到我用下面的preg-replace模式剝去HTML：

$data = preg_replace('/(.*?)(<div id="stats">)(.*?)(<\/div>)(.*?)/','$2',$data);

但那不起作用。我得到了TPB的整個頁面，而不僅僅是統計數據。有人有答案嗎？

在此先感謝。

來源

2012-05-04 Ton Hoekstra

忘記做屏幕的正則表達式再殺，用domDocument而是看它是多麼簡單：

<?php 
function curl_get($url){ 
    $useragent = 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; Media Center PC 4.0)'; 
    $ch = curl_init(); 
    curl_setopt($ch, CURLOPT_URL,$url); 
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
    curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,5); 
    curl_setopt($ch, CURLOPT_USERAGENT, $useragent); 
    curl_setopt($ch,CURLOPT_COOKIE,"language=nl_NL; c[thepiratebay.se][/][language]=nl_NL"); 
    $data=curl_exec($ch); 
    curl_close($ch); 
    return $data; 
} 

function get_pb_stats(){ 
    $html = curl_get("http://thepiratebay.se"); 
    // Create a new DOM Document 
    $xml = new DOMDocument(); 

    // Load the html contents into the DOM 
    @$xml->loadHTML($html); 

    $return = trim($xml->getElementById('stats')->nodeValue); 
    //regex to add the brake tag after 15:04:05. 
    $return = preg_replace('/\d{2}[:]\d{2}[:]\d{2}[.]/','${0}<br />',$return); 
    return $return; 
} 

echo get_pb_stats(); 

/* 
5.695.213 geregistreerde gebruikers Laatste update 15:04:05.<br />35.505.322 peers (25.948.185 seeders + 9.557.137 leechers) in 4.546.560 torrents. 
*/ 
?>

來源

2012-05-04 13:08:03

爲什麼你不只是'返回strip_tags（trim（$ xml-> getElementById（'stats'） - > nodeValue））;'？ – DaveRandom

有趣的是，我只是假設它會...現在有一個發揮。 – DaveRandom

其實您的代碼段工作; p –

你爲什麼不使用的preg_match（）？

preg_match('/<div id="stats">(.*)<br \/>/Usi', $data, $m); 
$stats = $m[1];

來源

2012-05-04 13:10:01 oddtwelve

獲取HTML從捲曲和帶HTML與網頁預浸替換

回答

相關問題