2013-04-27 57 views
3

我有一個字符串,精巧的URL和其他文本。我想將所有的URL都存入$matches數組中。但是,下面的代碼將無法獲得全部的URL中$matches陣列:如何從文本字符串獲取網址?

$matches = array(); 
$text = "soundfly.us schoollife.edu hello.net some random news.yahoo.com text http://tinyurl.com/9uxdwc some http://google.com random text http://tinyurl.com/787988 and others will en.wikipedia.org/wiki/Country_music URL"; 
preg_match_all('$\b(https?|ftp|file)://[-A-Z0-9+&@#/%?=~_|!:,.;]*[-A-Z0-9+&@#/%=~_|]$i', $text, $matches); 
print_r($matches); 

上面的代碼將得到:

http://tinyurl.com/9uxdwc 
http://google.com 
http://tinyurl.com/787988 

,但忽略了以下4個網址:

schoollife.edu 
hello.net 
news.yahoo.com 
en.wikipedia.org/wiki/Country_music 

能否請你告訴我用一個例子,我怎麼能修改上面的代碼來獲取所有的URL

+1

你的正則表達式強制指定一個http/https/ftp/file協議。使其可選。 – sevenseacat 2013-04-27 08:11:50

+1

@sevenseacat我也有類似的問題。你可以用修改後的正則表達式來演示一個例子嗎? – 2013-04-27 08:45:00

+0

查看我的更新回答 – 2013-04-27 08:57:51

回答

1

這是你需要什麼?

$matches = array(); 
$text = "soundfly.us schoollife.edu hello.net some random news.yahoo.com text http://tinyurl.com/9uxdwc some http://google.com random text http://tinyurl.com/787988 and others will en.wikipedia.org/wiki/Country_music URL"; 
preg_match_all('$\b((https?|ftp|file)://)?[-A-Z0-9+&@#/%?=~_|!:,.;]*\.[-A-Z0-9+&@#/%=~_|]+$i', $text, $matches); 
print_r($matches); 

我所做的協議部分optionnal,增加劈裂域和TLD和使用點的「+」來獲取點後滿弦(TLD +額外信息)

結果是:

[0] => soundfly.us 
[1] => schoollife.edu 
[2] => hello.net 
[3] => news.yahoo.com 
[4] => http://tinyurl.com/9uxdwc 
[5] => http://google.com 
[6] => http://tinyurl.com/787988 
[7] => en.wikipedia.org/wiki/Country_music 

也可以使用IP地址,因爲強制存在點。用字符串「192.168.0.1」和「192.168.0.1/test/index.php」測試