使用正則表達式提取完整的url內容

好吧，我正在使用（PHP）file_get_contents來閱讀一些網站，這些網站只有一個鏈接Facebook ...我得到整個網站後，我想找到完整的Url for Facebook使用正則表達式提取完整的url內容

因此，在某些部分會出現：

<a href="http://facebook.com/username" >

我想獲得http://facebook.com/username，我從第一個（「）到最後（」）的意思。用戶名是可變的...可以是username.somethingelse，我可以在「href」之前或之後有一些屬性。

萬一我沒有被很明確：

<a href="http://facebook.com/username" > //I want http://facebook.com/username 
<a href="http://www.facebook.com/username" > //I want http://www.facebook.com/username 
<a class="value" href="http://facebook.com/username. some" attr="value" > //I want http://facebook.com/username. some

以上所有例子中，可以用單打報價

<a href='http://facebook.com/username' > //I want http://facebook.com/username

感謝所有

來源

2011-07-14 Richard Pérez

不要使用正則表達式在HTML上。這是一支霰彈槍，會在某個時候吹掉你的腿。使用DOM來代替：

$dom = new DOMDocument; 
$dom->loadHTML(...); 
$xp = new DOMXPath($dom); 

$a_tags = $xp->query("//a"); 
foreach($a_tags as $a) { 
    echo $a->getAttribute('href'); 
}

來源

2011-07-14 15:10:51

我想提供一個很好的資源，解釋爲什麼您不應該使用Regexp進行OP閱讀。但是我找不到我想要的那個。任何你有足夠資源的機會，馬克？ – rzetterberg

這是html + regex出現在這個網站上的規範答案：http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 –

嗯，真正的是，我使用DOM，以獲得我知道Facebook鏈接的文檔的一部分，但在那一部分我會得到1到6個鏈接，我怎麼才能得到只有那個有facebook的人 –

我會建議使用DOMDocument爲了這個目標，而不是使用正則表達式。以下是您的案例的快速代碼示例：

$dom = new DOMDocument(); 
$dom->loadHTML($content); 

// To hold all your links... 
$links = array(); 

$hrefTags = $dom->getElementsByTagName("a"); 
    foreach ($hrefTags as $hrefTag) 
     $links[] = $hrefTag->getAttribute("href"); 

print_r($links); // dump all links

來源

2011-07-14 15:12:38 anubhava

使用正則表達式提取完整的url內容

回答

相關問題