href網址匹配，

-2

可能重複：
Grabbing the href attribute of an A element href網址匹配，

我試着在頁面源匹配：

<a href="/download/blahbal.html">

我已經看過另一個鏈接此網站，並使用正則表達式：

'/<a href=["\']?(\/download\/[^"\'\s>]+)["\'\s>]?/i'

它返回頁面上的所有href鏈接，但是它忽略了某些鏈接上的.html。

任何幫助將不勝感激。

謝謝

來源

2011-09-01 Jamesmiller

也許正則表達式錯過這樣的HREF，反正我建議你使用一個解析器（DOM文檔）和用這個檢索所有的「a」標籤。「 – CaNNaDaRk

」在某些鏈接上缺失「 - 您可以舉一個.html丟失的例子嗎？ – FrankS

使用XPath'/ html/body // a [@ href = starts-with（。，'/ download'）]' – Gordon

首先使用方法described here檢索所有的HREF，那麼你可以使用正則表達式或strpos爲「過濾掉」那些誰不與/下載/啓動。
堆棧溢出的其他許多帖子（see this）討論了你應該使用解析器而不是正則表達式的原因。一旦你解析了文檔並獲得了你需要的hrefs，那麼你可以用簡單的函數將它們過濾掉。

一些代碼：

$dom = new DOMDocument; 
//html string contains your html 
$dom->loadHTML($html); 
//at the end of the procedure this will be populated with filtered hrefs 
$hrefs = array(); 
foreach($dom->getElementsByTagName('a') as $node) { 
    //look for href attribute 
    if($node->hasAttribute('href')) { 
     $href = $node->getAttribute('href'); 
     // filter out hrefs which don't start with /download/ 
     if(strpos($href, "/download/") === 0) 
      $hrefs[] = $href; // store href 
    } 
}

來源

2011-09-01 10:07:16 CaNNaDaRk

經過測試，作品。如果有必要，strpos很容易被正則表達式（preg_match）所忽略。 – CaNNaDaRk

謝謝，即使你可以用正則表達式，我仍然很好奇。 – Jamesmiller

這取決於匹配中缺少哪些鏈接，也許正則表達式只是稍微調整一下。 – CaNNaDaRk

href網址匹配，

回答

相關問題