如何從此頁提取網址

我正在嘗試使用curl從網上獲取一些數據。我擁有的是像somewebsite.com這樣的網址。在這個網站上，還有的<divs>一大堆已經一個class="control-element"和有此標記：如何從此頁提取網址

<div class="control-element"> 
    <a href="http://someurl.com/and/some/path">Anchor Text</a> 
</div>

我應該怎樣提取每個這些鏈接的URL和錨文本？我應該使用正則表達式嗎？或者什麼是最好的方式呢？

來源

2011-08-02 sameold

我認爲在這種特殊情況下，你可以使用file_get_contents()，而不是cURL就好了。

對於html解析看看Simple HTML DOM。

如果你不希望使用任何3第三方庫，這裏是使用正則表達式的例子：

$doc = file_get_contents("http://someurl.com/"); 
preg_match_all('/<div class="control-element">(.*)<\/div>/isU', $doc, $matches); 
$co = count($matches[1]); 
for($i = 0; $i<$co;$i++) 
{ 
    preg_match_all('/<a href="(.*)">(.*)<\/a>/isU', $matches[1][$i], $matches2); 
    echo("URL: ".$matches2[1][0]." Anchor: ".$matches2[2][0]."<br>"); 
}

來源

2011-08-02 09:07:07 technology

我不知道我要安裝和使用一個外部庫這一點。 – sameold

file_get_contents（）不是外部庫，請單擊develroot發佈的鏈接。這是一個本地PHP函數。 – Chamilyan

我編輯了我的文章，並添加了一個僅使用php內置函數的示例。檢查 – technology

如何從此頁提取網址

回答

相關問題