2011-08-18 135 views
1

我需要找到一個給定的URL是否是有效還是無效,該方案是如果它包含的URL haviing正則表達式解析URL PHP

1.Generic頂級域名 2.Country代碼頂部應該被允許 參考以下網址-level域 http://en.wikipedia.org/wiki/List_of_Internet_top-level_domains

我需要做到這一點在PHP中,這是目前我在做什麼

 
    $regexUrl = "((https?|ftp)\:\/\/)?"; // SCHEME 
    $regexUrl .= "([a-zA-Z0-9+!*(),;?&=\$_.-]+(\:[a-zA-Z0-9+!*(),;?&=\$_.-]+)[email protected])?"; // User and Pass 
    $regexUrl .= "([a-zA-Z0-9-]+)\.([a-zA-Z]{2,3})"; // Host or IP 
    $regexUrl .= "(\:[0-9]{2,5})?"; // Port 
    $regexUrl .= "(\/([a-zA-Z0-9+\$_-]\.?)+)*\/?"; // Path 
    $regexUrl .= "(\?[a-zA-Z+&\$_.-][a-zA-Z0-9;:@&%=+\/\$_.-]*)?"; // GET Query 
    $regexUrl .= "(#[a-zA-Z_.-][a-zA-Z0-9+\$_.-]*)?"; // Anchor 
    //if(preg_match_all("#\bhttps?://[^\s()]+(?:\([\w\d]+\)|([^[:punct:]\s]|/))#", $message, $matches1, PREG_PATTERN_ORDER)) 
    //$pattern = '/((https?|ftp)\:(\/\/)|(file\:\/{2,3}))?(((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))|(((([a-zA-Z0-9]+)(\.)?)+)(\.)(com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum|[a-z]{2}))([\/][\/a-zA-Z0-9\.]*)*([\/]?(([\?][a-zA-Z0-9]+[\=][a-zA-Z0-9\%\(\)]*)([\&][a-zA-Z0-9]+[\=][a-zA-Z0-9\%\(\)]*)*))?/'; 
    if(preg_match_all("/$regexUrl/", $urlMessage, $matches1, PREG_PATTERN_ORDER)) 
    { 
     try 
     { 
      foreach($matches1[0] as $urlToTrim1) 
      { 
       $url= $urlToTrim1; 
       echo $url; 
      } 
     } 
     catch(Exception $e) 
     { 
      $url="-1"; 
     } 
    } 
+0

,問題是?......那也說不定更容易使用內置的PHP功能 - http://php.net/manual/en/function.parse-url.php – arunkumar

回答

6

要弄清楚,如果它通常是一個有效的UR L:

filter_var($url, FILTER_VALIDATE_URL) 

http://www.php.net/manual/en/function.filter-var.php

如果你想確認TLD是,在您認可名單(我不知道是否filter_var走得更遠,來檢查TLD是否實際存在):

$host = parse_url($url, PHP_URL_HOST); 
$tld = substr($host, strrpos($host, '.') + 1); 

// check if $tld is in a list of allowed TLDs 

或者只是嘗試使用gethostbyname來查找域的DNS記錄。如果存在,它是一個有效的域。*


*除非你是DNS欺騙,如果這種情況下,重要的是你......