2011-01-19 41 views
5

我在網站上找到了這個正則表達式。據說這是最好的URL驗證表達,我同意。 Diego Perini創造了它。NSRegularExpression來驗證URL

我面臨的問題是當試圖與objective-C一起使用它來檢測字符串上的URL。我曾嘗試使用像NSRegularExpressionAnchorsMatchLines,NSRegularExpressionIgnoreMetacharacters等其他選項,但仍然沒有運氣。

表達式Objective-C的格式不正確嗎?我錯過了什麼嗎?有任何想法嗎?

我也試過約翰格魯伯的正則表達式,但它失敗了,一些無效的URL。

 Regular Expression         Explanation of expression      

^             match at the beginning 
//Protocol identifier 
(?: 
    (?:https?|ftp         http, https or ftp 
    ):\\/\\/          :// 
)?             optional 
// User:Pass authentication 
(?: 
    ^\\s+           non white spaces, 1 or more times 
    (?: 
     :^\\s*          : non white spaces, 0 or more times, optionally 
    )[email protected]            @ 
)?             optional 
//Private IP Addresses        ?! Means DO NOT MATCH ahead. So do not match any of the following 
(?: 
    (?!10           10               10.0.0.0 - 10.999.999.999 
     (?: 
      \\.\\d{1,3}        . 1 to 3 digits, three times 
     ){3} 
    ) 
    (?!127           127               127.0.0.0 - 127.999.999.999 
     (?: 
      \\.\\d{1,3}        . 1 to 3 digits, three times 
     ){3} 
    ) 
    (?!169\\.254         169.254              169.254.0.0 - 169.254.999.999 
     (?: 
      \\.\\d{1,3}        . 1 to 3 digits, two times 
     ){2} 
    ) 
    (?!192\\.168         192.168              192.168.0.0 - 192.168.999.999 
     (?: 
      \\.\\d{1,3}        . 1 to 3 digits, two times 
     ){2} 
    ) 
    (?!172\\.          172.              172.16.0.0 - 172.31.999.999 
     (?:                            
      1[6-9]         1 followed by any number between 6 and 9 
      |          or 
      2\\d         2 and any digit 
      |          or 
      3[0-1]         3 followed by a 0 or 1 
     ) 
     (?: 
      \\.\\d{1,3}        . 1 to 3 digits, two times 
     ){2} 
    ) 
    //First Octet IPv4        // match these. Any non network or broadcast IPv4 address 
    (?: 
     [1-9]\\d?         any number from 1 to 9 followed by an optional digit  1 - 99 
     |           or 
     1\\d\\d          1 followed by any two digits        100 - 199 
     |           or 
     2[01]\\d         2 followed by any 0 or 1, followed by a digit    200 - 219 
     |           or 
     22[0-3]          22 followed by any number between 0 and 3     220 - 223 
    ) 
    //Second and Third Octet IPv4 
    (?: 
     \\.           . 
     (?: 
      1?\\d{1,2}        optional 1 followed by any 1 or two digits     0 - 199 
      |          or 
      2[0-4]\\d        2 followed by any number between 0 and 4, and any digit  200 - 249 
      |          or 
      25[0-5]         25 followed by any numbers between 0 and 5     250 - 255 
     ) 
    ){2}           two times 
    //Fourth Octet IPv4 
    (?: 
     \\.           . 
     (?: 
      [1-9]\\d?        any number between 1 and 9 followed by an optional digit 1 - 99 
      |          or 
      1\\d\\d         1 followed by any two digits        100 - 199 
      |          or 
      2[0-4]\\d        2 followed by any number between 0 and 4, and any digit  200 - 249 
      |          or 
      25[0-4]         25 followed by any number between 0 and 4     250 - 254 
     ) 
    ) 
    //Host name 
    |            or     
    (?: 
     (?: 
      [a-z\u00a1-\uffff0-9]+-?    any letter, digit or character one or more times with optional - 
     )*           zero or more times 
     [a-z\u00a1-\uffff0-9]+      any letter, digit or character one or more times 
    ) 
    //Domain name 
    (?: 
     \\.           . 
     (?: 
      [a-z\u00a1-\uffff0-9]+-?    any letter, digit or character one or more times with optional - 
     )*           zero or more times 
     [a-z\u00a1-\uffff0-9]+      any letter, digit or character one or more times 
    )*            zero or more times 
    //TLD identifier 
    (?: 
     \\.           . 
     (?: 
      [a-z\u00a1-\uffff]{2,}     any letter, digit or character more than two times 
     ) 
    ) 
) 
//Port number 
(?: 
    :\\d{2,5}          : followed by any digit, two to five times, optionally 
)?    
//Resource path 
(?: 
    \\/[^\\s]*         /followed by an optional non space character, zero or more times 
)?             optional 
$             match at the end 

編輯 我想我忘了說,我現在用的是表達以下代碼:(部分代碼)

NSError *error = NULL; 
NSRegularExpression *detector = [NSRegularExpression regularExpressionWithPattern:[self theRegularExpression] options:0 error:&error]; 
NSArray *links = [detector matchesInString:theText options:0 range:NSMakeRange(0, theText.length)]; 

回答

9
^(?i)(?:(?:https?|ftp):\\/\\/)?(?:\\S+(?::\\S*)[email protected])?(?:(?:[1-9]\\d?|1\\d\\d|2[01]\\d|22[0-3])(?:\\.(?:1?\\d{1,2}|2[0-4]\\d|25[0-5])){2}(?:\\.(?:[1-9]\\d?|1\\d\\d|2[0-4]\\d|25[0-4]))|(?:(?:[a-z\\u00a1-\\uffff0-9]+-?)*[a-z\\u00a1-\\uffff0-9]+)(?:\\.(?:[a-z\\u00a1-\\uffff0-9]+-?)*[a-z\\u00a1-\\uffff0-9]+)*(?:\\.(?:[a-z\\u00a1-\\uffff]{2,})))(?::\\d{2,5})?(?:\\/[^\\s]*)?$ 

是我發現的最好的URL驗證正則表達式,它解釋了我的問題。它已經被格式化爲在Objective-C上工作。但是,與NSRegularExpression一起使用它給了我各種各樣的問題,包括我的應用程序崩潰。 RegexKitLite在處理它時沒有問題。我不知道這是一個尺寸限制還是一些沒有設置的標誌。 我的最終代碼看起來像:

//First I take the string and put every word in an array, then I match every word with the regular expression 
NSArray *splitIntoWordsArray = [textToMatch componentsSeparatedByCharactersInSet:[NSCharacterSet whitespaceAndNewLineCharacterSet]]; 
NSMutableString *htmlString = [NSMutableString stringWithString:textToMatch]; 
for (NSString *theText in splitIntoWordsArray){ 
    NSEnumerator *matchEnumerator = [theText matchEnumeratorWithRegex:theRegularExpressionString]; 
    for (NSString *temp in matchEnumerator){ 
     [htmlString replaceOccurrencesOfString:temp withString:[NSString stringWithFormat:@"<a href=\"%@\">%@</a>", temp, temp] options:NSLiteralSearch range:NSMakeRange(0, [htmlString length])]; 
    } 
} 
[htmlString replaceOccurrencesOfString:@"\n" withString:@"<br />" options:NSLiteralSearch range:NSMakeRange(0, htmlString.length)]; 
//embed the text on a webView as HTML 
[webView loadHTMLString:[NSString stringWithFormat:embedHTML, [mainFont fontName], [mainFont pointSize], htmlString] baseURL:nil]; 

結果是:UIWebView一些嵌入HTML,其中的網址和電子郵件是可以點擊的。不要忘記設置dataDetectorTypes = UIDataDetectorTypeNone

您也可以嘗試

NSError *error = NULL; 
NSRegularExpression *expression = [NSRegularExpression regularExpressionWithPattern:@"(?i)(?:(?:https?):\\/\\/)?(?:\\S+(?::\\S*)[email protected])?(?:(?:[1-9]\\d?|1\\d\\d|2[01]\\d|22[0-3])(?:\\.(?:1?\\d{1,2}|2[0-4]\\d|25[0-5])){2}(?:\\.(?:[1-9]\\d?|1\\d\\d|2[0-4]\\d|25[0-4]))|(?:(?:[a-z\\u00a1-\\uffff0-9]+-?)*[a-z\\u00a1-\\uffff0-9]+)(?:\\.(?:[a-z\\u00a1-\\uffff0-9]+-?)*[a-z\\u00a1-\\uffff0-9]+)*(?:\\.(?:[a-z\\u00a1-\\uffff]{2,})))(?::\\d{2,5})?(?:\\/[^\\s]*)?" options:NSRegularExpressionCaseInsensitive error:&error]; 
if (error) 
    NSLog(@"error"); 
NSString *someString = @"This is a sample of a sentence with a URL http://. http://.. http://../ http://? http://?? http://??/ http://# http://-error-.invalid/ http://-.~_!$&'()*+,;=:%40:80%2f::::::@example.com within it."; 
NSRange range = [expression rangeOfFirstMatchInString:someString options:NSMatchingCompleted range:NSMakeRange(0, [someString length])]; 
if (!NSEqualRanges(range, NSMakeRange(NSNotFound, 0))){ 
    NSString *match = [someString substringWithRange:range]; 
    NSLog(@"%@", match); 
} 
else { 
    NSLog(@"no match"); 
} 

希望它可以幫助別人,將來

正則表達式有時會導致應用程序掛起,所以我決定用格魯伯的常規修改表達式以識別沒有協議或萬維網部分的網址:

(?i)\\b((?:[a-z][\\w-]+:(?:/{1,3}|[a-z0-9%])|www\\d{0,3}[.]|[a-z0-9.\\-]+[.][a-z]{2,4}/?)(?:[^\\s()<>]+|\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\))*(?:\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\)|[^\\s`!()\\[\\]{};:'\".,<>?«»「」‘’])*) 
7

我缺少的東西?

你錯過了內置的東西來爲你做這個。有一個方便的對象叫做NSDataDetector。你創建它來查找某些數據「類型」(例如,NSTextCheckingTypeLink),然後請求它的-matchesInString:options:range:

Here's an earlier answer of mine showing how to use it

+0

謝謝戴夫爲您的快速回答。我曾嘗試過,但它不承認一些網址,例如.asia,.info等。這就是當URL不是像http://healthyhomes.asia那樣良​​好的結構這就是爲什麼我使用常規表達。使用在線測試儀,它可以在協議部分檢測到healthhomes.asia或info.info。 – GianPac 2011-01-19 18:43:02

+0

@Dave DeLong www.google.c – JAHelia 2016-07-27 11:04:50