2013-12-16 46 views
0

我想在另一個網站上找到一個字符串。我一直在看解析器,我不知道最好的解決方法。我看了一個HTML DOM解析器,但我只需要一個簡單的一行輸出。我只想獲得鏈接「url:'http://s2.example.com/streams/i23374.mp4?k=12f34588cf171f3bbf3d35da4db43b06'」到一個變量。PHP解析器 - 在HTML中查找字符串

<script> 
       flowplayer("player", "http://www.example.com/flowplayer-3.2.16.swf", { 
        canvas: { 
         backgroundGradient: "none", 
         backgroundColor: "#000000" 
        }, 
        clip: { 
         provider: 'lighttpd', 
         url: 'http://s1.example.com/streams/i23374.mp4?k=12f34588cf171f3bbf3d35da4db43b06', 
         scaling: 'fit' 
        }, 
        plugins: { 
         lighttpd: { 
          url: 'http://www.example.com/flowplayer.pseudostreaming-3.2.12.swf' 
         } 
        } 
       }); 
      </script> 
+2

查看phpQuery或QueryPath的單行程。這種DOM遍歷前端(或簡單的longwinded DOMDocument)仍然只會爲您提供Javascript blob。你需要一個正則表達式和/或JSON/L解析器來提取URL。 – mario

回答

0

這是一個方便的功能,用於從兩個分隔符之間獲取文本;

<?php 
function extract_unit($string, $start, $end) 
{ 
    $pos = stripos($string, $start); 
    $str = substr($string, $pos); 
    $str_two = substr($str, strlen($start)); 
    $second_pos = stripos($str_two, $end); 
    $str_three = substr($str_two, 0, $second_pos); 
    $unit = trim($str_three); // remove whitespaces 
    return $unit; 
} 

echo extract_unit($webpageSource, 'flowplayer("player", "', '", {'); 
?> 
+0

我知道了,我重讀了,非常感謝你! :) – BluGex

0

我會用DOMDocument

爲了得到一個鏈接斷錨的,它是:

$dd = new DOMDocument; 
@$dd->loadHTMLFile('http://s2.example.com/streams/i23374.mp4?k=12f34588cf171f3bbf3d35da4db43b06'); 
if($a = $dd->getElementsByTagName('a')){ 
    foreach($a as $t){ 
    $links[] = $t->getAttribute('href'); 
    } 
} 

現在$links與每個href數組或if(!isset($links))沒有結果。

要從腳本代碼中獲得JSON:

$dd = new DOMDocument; 
@$dd->loadHTMLFile('http://s2.example.com/streams/i23374.mp4?k=12f34588cf171f3bbf3d35da4db43b06'); 
if($s = $dd->getElementsByTagName('script')){ 
    $c = $dd->sameHTML($s->item(0))); 
} 

更改item(0)的水平,其中script標籤是他們的網頁上。現在$c是一個字符串。所以:

preg_match_all("/url: '.+'/", $c, $results); 

$results是一個數組應該包含url: 'whatever'。 所以:

foreach($results as $v){ 
    $a[] = preg_replace('/url: /', '', $v); 
} 

$a是結果的數組。

+0

我不想從該網站獲取信息我試圖從源代碼中提取該鏈接。 – BluGex

+0

等一下,我會這樣做的。 – PHPglue

0

大部分RegExp是解析字符串的最佳方法,但不建議處理JSON。

下面是一個例子(我的編碼字符串,它與您的原始HTML):

<?php 
$data = base64_decode("PHNjcmlwdD4KICAgICAgICAgICAgICAgIGZsb3dwbGF5ZXIoInBsYXllciIsICJodHRwOi8vd3d3LmV4YW1wbGUuY29tL2Zsb3dwbGF5ZXItMy4yLjE2LnN3ZiIsICB7CiAgICAgICAgICAgICAgICAgICAgY2FudmFzOiB7CiAgICAgICAgICAgICAgICAgICAgICAgIGJhY2tncm91bmRHcmFkaWVudDogIm5vbmUiLAogICAgICAgICAgICAgICAgICAgICAgICBiYWNrZ3JvdW5kQ29sb3I6ICIjMDAwMDAwIgogICAgICAgICAgICAgICAgICAgIH0sCiAgICAgICAgICAgICAgICAgICAgY2xpcDogewogICAgICAgICAgICAgICAgICAgICAgICBwcm92aWRlcjogJ2xpZ2h0dHBkJywKICAgICAgICAgICAgICAgICAgICAgICAgdXJsOiAnaHR0cDovL3MxLmV4YW1wbGUuY29tL3N0cmVhbXMvaTIzMzc0Lm1wND9rPTEyZjM0NTg4Y2YxNzFmM2JiZjNkMzVkYTRkYjQzYjA2JywKICAgICAgICAgICAgICAgICAgICAgICAgc2NhbGluZzogJ2ZpdCcKICAgICAgICAgICAgICAgICAgICB9LAogICAgICAgICAgICAgICAgICAgIHBsdWdpbnM6IHsKICAgICAgICAgICAgICAgICAgICAgICAgbGlnaHR0cGQ6IHsKICAgICAgICAgICAgICAgICAgICAgICAgICAgIHVybDogJ2h0dHA6Ly93d3cuZXhhbXBsZS5jb20vZmxvd3BsYXllci5wc2V1ZG9zdHJlYW1pbmctMy4yLjEyLnN3ZicKICAgICAgICAgICAgICAgICAgICAgICAgfQogICAgICAgICAgICAgICAgICAgIH0KICAgICAgICAgICAgICAgIH0pOwogICAgICAgICAgICA8L3NjcmlwdD4="); 

if(preg_match('/clip:\s*\{[\s\S]+url:\s*\'(\S+)\',\s*scaling/', $data, $match) === 1) 
echo $match[1]; 

?> 

雖然在JSON的已編碼,它不能被PHP的json_decode解析,因爲PHP的JSON格式太嚴格(屬性應該用引號包裝)。