PHP解析器 - 在HTML中查找字符串

我想在另一個網站上找到一個字符串。我一直在看解析器，我不知道最好的解決方法。我看了一個HTML DOM解析器，但我只需要一個簡單的一行輸出。我只想獲得鏈接「url：'http://s2.example.com/streams/i23374.mp4?k=12f34588cf171f3bbf3d35da4db43b06'」到一個變量。PHP解析器 - 在HTML中查找字符串

<script> 
       flowplayer("player", "http://www.example.com/flowplayer-3.2.16.swf", { 
        canvas: { 
         backgroundGradient: "none", 
         backgroundColor: "#000000" 
        }, 
        clip: { 
         provider: 'lighttpd', 
         url: 'http://s1.example.com/streams/i23374.mp4?k=12f34588cf171f3bbf3d35da4db43b06', 
         scaling: 'fit' 
        }, 
        plugins: { 
         lighttpd: { 
          url: 'http://www.example.com/flowplayer.pseudostreaming-3.2.12.swf' 
         } 
        } 
       }); 
      </script>

來源

2013-12-16 BluGex

查看phpQuery或QueryPath的單行程。這種DOM遍歷前端（或簡單的longwinded DOMDocument）仍然只會爲您提供Javascript blob。你需要一個正則表達式和/或JSON/L解析器來提取URL。 – mario

這是一個方便的功能，用於從兩個分隔符之間獲取文本;

<?php 
function extract_unit($string, $start, $end) 
{ 
    $pos = stripos($string, $start); 
    $str = substr($string, $pos); 
    $str_two = substr($str, strlen($start)); 
    $second_pos = stripos($str_two, $end); 
    $str_three = substr($str_two, 0, $second_pos); 
    $unit = trim($str_three); // remove whitespaces 
    return $unit; 
} 

echo extract_unit($webpageSource, 'flowplayer("player", "', '", {'); 
?>

來源

2013-12-16 00:29:44

我知道了，我重讀了，非常感謝你！ :) – BluGex

我會用DOMDocument：

爲了得到一個鏈接斷錨的，它是：

$dd = new DOMDocument; 
@$dd->loadHTMLFile('http://s2.example.com/streams/i23374.mp4?k=12f34588cf171f3bbf3d35da4db43b06'); 
if($a = $dd->getElementsByTagName('a')){ 
    foreach($a as $t){ 
    $links[] = $t->getAttribute('href'); 
    } 
}

現在$links與每個href數組或if(!isset($links))沒有結果。

要從腳本代碼中獲得JSON：

$dd = new DOMDocument; 
@$dd->loadHTMLFile('http://s2.example.com/streams/i23374.mp4?k=12f34588cf171f3bbf3d35da4db43b06'); 
if($s = $dd->getElementsByTagName('script')){ 
    $c = $dd->sameHTML($s->item(0))); 
}

更改item(0)的水平，其中script標籤是他們的網頁上。現在$c是一個字符串。所以：

preg_match_all("/url: '.+'/", $c, $results);

$results是一個數組應該包含url: 'whatever'。所以：

foreach($results as $v){ 
    $a[] = preg_replace('/url: /', '', $v); 
}

$a是結果的數組。

來源

2013-12-16 00:36:15 PHPglue

我不想從該網站獲取信息我試圖從源代碼中提取該鏈接。 – BluGex

等一下，我會這樣做的。 – PHPglue

大部分RegExp是解析字符串的最佳方法，但不建議處理JSON。

下面是一個例子（我的編碼字符串，它與您的原始HTML）：

<?php 
$data = base64_decode("PHNjcmlwdD4KICAgICAgICAgICAgICAgIGZsb3dwbGF5ZXIoInBsYXllciIsICJodHRwOi8vd3d3LmV4YW1wbGUuY29tL2Zsb3dwbGF5ZXItMy4yLjE2LnN3ZiIsICB7CiAgICAgICAgICAgICAgICAgICAgY2FudmFzOiB7CiAgICAgICAgICAgICAgICAgICAgICAgIGJhY2tncm91bmRHcmFkaWVudDogIm5vbmUiLAogICAgICAgICAgICAgICAgICAgICAgICBiYWNrZ3JvdW5kQ29sb3I6ICIjMDAwMDAwIgogICAgICAgICAgICAgICAgICAgIH0sCiAgICAgICAgICAgICAgICAgICAgY2xpcDogewogICAgICAgICAgICAgICAgICAgICAgICBwcm92aWRlcjogJ2xpZ2h0dHBkJywKICAgICAgICAgICAgICAgICAgICAgICAgdXJsOiAnaHR0cDovL3MxLmV4YW1wbGUuY29tL3N0cmVhbXMvaTIzMzc0Lm1wND9rPTEyZjM0NTg4Y2YxNzFmM2JiZjNkMzVkYTRkYjQzYjA2JywKICAgICAgICAgICAgICAgICAgICAgICAgc2NhbGluZzogJ2ZpdCcKICAgICAgICAgICAgICAgICAgICB9LAogICAgICAgICAgICAgICAgICAgIHBsdWdpbnM6IHsKICAgICAgICAgICAgICAgICAgICAgICAgbGlnaHR0cGQ6IHsKICAgICAgICAgICAgICAgICAgICAgICAgICAgIHVybDogJ2h0dHA6Ly93d3cuZXhhbXBsZS5jb20vZmxvd3BsYXllci5wc2V1ZG9zdHJlYW1pbmctMy4yLjEyLnN3ZicKICAgICAgICAgICAgICAgICAgICAgICAgfQogICAgICAgICAgICAgICAgICAgIH0KICAgICAgICAgICAgICAgIH0pOwogICAgICAgICAgICA8L3NjcmlwdD4="); 

if(preg_match('/clip:\s*\{[\s\S]+url:\s*\'(\S+)\',\s*scaling/', $data, $match) === 1) 
echo $match[1]; 

?>

雖然在JSON的已編碼，它不能被PHP的json_decode解析，因爲PHP的JSON格式太嚴格（屬性應該用引號包裝）。

來源

2013-12-16 00:59:32 CodeColorist

PHP解析器 - 在HTML中查找字符串

回答

相關問題