刮網頁和檢索的JavaScript變量

我需要刮一個網頁，在內嵌的JavaScript代碼，比如內嵌JavaScript數組：刮網頁和檢索的JavaScript變量

<script> 
    var videos = new Array(); 
    videos[0] = 'http://myvideos.com/video1.mov'; 
    videos[1] = .... 
    .... 
</script>

什麼是接近這一點，並有最終的最簡單方法這些視頻網址的PHP數組？

編輯：所有視頻都是.mov擴展名。

來源

2012-01-12 Nacho

我有幾行使用file_get_contents並嘗試了幾個正則表達式。正則表達式，我不擅長。 – Nacho 2012-01-12 23:17:33

這是更復雜一點，但它只能得到這些鏈接，這是形式的真正videos[0] = 'http://myvideos.com/video1.mov';

$tmp=str_replace(array("\r","\n"),'',$original,$matches); 
$pattern='/\<script\>\s+var\ videos.*?((\s*videos\[\d+\]\ \=\ .http\:\/\/.*?\;\s*?)+)(.*?)\<\/script\>/'; 
$a=preg_match_all($pattern,$tmp,$matches); 
unset($tmp); 

if (!$a) die("no matches"); 

$pattern="/videos\[\d+\]\ \=\ /"; 
$matches=preg_split($pattern,$matches[1][0]); 

$final=array(); 
while(sizeof($matches)>0) { 
    $match=trim(array_shift($matches)); 
    if ($match=='') continue; 
    $final[]=substr($match,1,-2); 
} 
unset($matches); 

print_r($final);

從這裏OP反饋後是簡化版本：

$original=file_get_contents($url); 
$pattern='/http\:\/\/.*?\.mov/'; 
$a=preg_match_all($pattern,$original,$matches); 
if (!$a) die("no matches"); 
print_r($matches[0]);

來源

2012-01-12 23:35:47

謝謝，我會檢查這一點。我認爲它可以更容易，因爲所有的視頻總是.mov – Nacho 2012-01-12 23:37:25

所以實際上你想要從該頁面刮取.mov文件的所有鏈接？ – 2012-01-12 23:40:08

沒錯。 [dummytext] – Nacho 2012-01-12 23:42:05

您可以通過使用file_get_contents讀取頁面，然後使用正則表達式來檢索url。這是我知道的最簡單的方法，特別是如果你知道視頻的文件擴展名。例：

<?php 
$file = file_get_contents('http://google.com'); 
$pattern = '/http:\/\/([a-zA-Z0-9\-\.]+\.[fr|com]+)/i'; 
preg_match_all($pattern, $file, $matches); 
var_dump($matches);

來源

2012-01-12 23:13:40

這正是我的第一個方法。我想沒有太多的選擇，是嗎？ – Nacho 2012-01-12 23:18:15

刮網頁和檢索的JavaScript變量

回答

相關問題