2013-02-03 45 views
0

在下面的代碼中,我試圖使用正則表達式來提取下面的文本文件的部分。來自正則表達式與捕獲組的gobbledygook

- (void)connectionDidFinishLoading:(NSURLConnection *)connection 
{ 
    NSLog(@"Succeeded! Received %d bytes of data",[receivedData length]); 
    NSString *string = [[NSString alloc] initWithData:receivedData encoding:NSISOLatin1StringEncoding]; 
    NSLog(@"string length: %d", [string length]); 
    NSError *error = nil; 
    NSString *toMatch = @"\[Board\\t\"([0-9]?)\"]*\[Dealer\\t\"([NEWS])\"]*"; 
    NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:toMatch 
     options:0 error:&error]; 
    NSLog(@"length: %d", [toMatch length]); 
    NSUInteger numberOfMatches = [regex numberOfMatchesInString:string options:0 range:NSMakeRange(0, [string length])]; 
    NSLog(@" %ud", numberOfMatches); 
    for (NSTextCheckingResult* match in [regex matchesInString:string options:0 range:NSMakeRange(0, [string length])]){ 
     // cannot make this work: NSRange trange =[match range]; 
     // cannot make this work: NSLog(@"range %i,%i", trange); 
     NSString* tstring=[string substringWithRange:trange]; 
     NSLog(@" %@", tstring);} 
} 

我使用NSRegularExpression挑選出從下面摘錄的文本信息。更具體地說,我需要每個板子的Board號碼和Dealer值(有大約40塊板子,並且我已經刪除了列表中的幾個不相關的行)。

[Board "1"] 
[Dealer "N"] 
[Vulnerable "None"] 
[Deal "N:Q952.652.KJT4.95 T.KQT84.A865.J73 K8763.A7.Q.KQT84 AJ4.J93.9732.A62"] 
[Scoring ""] 
[Declarer ""] 
[Contract ""] 
[Board "2"] 
[Dealer "E"] 
[Vulnerable "NS"] 
[Deal "E:K8542.3.4.AT7532 J76.K7.AT85.KQJ8 QT3.AJ84.KJ963.4 A9.QT9652.Q72.96"] 
[Scoring ""] 
[Declarer ""] 
[Contract ""] 

我正在官樣文章我for循環打印出來。 gobbledygook至少有兩個原因:我的正則表達式是錯誤的,或者我的for循環是錯誤的。

控制檯輸出和gobbledygook如下。

2013-02-03 11:00:14.161 BridgeDuplicate[51867:11303] the window: <UIWindow: 0x956eac0; frame = (0 0; 768 1024); hidden = YES; layer = <UIWindowLayer: 0x956ebc0>> 
2013-02-03 11:00:14.163 BridgeDuplicate[51867:11303] the rootViewController: <BSViewController: 0x7188220> 
2013-02-03 11:00:14.166 BridgeDuplicate[51867:11303] viewDidLoad 
2013-02-03 11:00:27.156 BridgeDuplicate[51867:11303] Succeeded! Received 303896 bytes of data 
2013-02-03 11:00:27.158 BridgeDuplicate[51867:11303] string length: 303896 
2013-02-03 11:00:27.164 BridgeDuplicate[51867:11303] length: 41 
2013-02-03 11:00:27.205 BridgeDuplicate[51867:11303] 264765d 
2013-02-03 11:00:27.205 BridgeDuplicate[51867:11303] 
2013-02-03 11:00:27.206 BridgeDuplicate[51867:11303] 
2013-02-03 11:00:27.206 BridgeDuplicate[51867:11303] 
2013-02-03 11:00:27.206 BridgeDuplicate[51867:11303] 
2013-02-03 11:00:27.206 BridgeDuplicate[51867:11303] l 
2013-02-03 11:00:27.206 BridgeDuplicate[51867:11303] 
2013-02-03 11:00:27.206 BridgeDuplicate[51867:11303] 
2013-02-03 11:00:27.206 BridgeDuplicate[51867:11303] 
2013-02-03 11:00:27.206 BridgeDuplicate[51867:11303] 
2013-02-03 11:00:27.206 BridgeDuplicate[51867:11303] 
2013-02-03 11:00:27.207 BridgeDuplicate[51867:11303] 
2013-02-03 11:00:27.207 BridgeDuplicate[51867:11303] 
2013-02-03 11:00:27.207 BridgeDuplicate[51867:11303] ea 
2013-02-03 11:00:27.207 BridgeDuplicate[51867:11303] d 
2013-02-03 11:00:27.207 BridgeDuplicate[51867:11303] 
2013-02-03 11:00:27.207 BridgeDuplicate[51867:11303] 
2013-02-03 11:00:27.207 BridgeDuplicate[51867:11303] 
2013-02-03 11:00:27.207 BridgeDuplicate[51867:11303] 
2013-02-03 11:00:27.207 BridgeDuplicate[51867:11303] 
2013-02-03 11:00:27.208 BridgeDuplicate[51867:11303] e 
2013-02-03 11:00:27.208 BridgeDuplicate[51867:11303] 
2013-02-03 11:00:27.208 BridgeDuplicate[51867:11303] a 
2013-02-03 11:00:27.208 BridgeDuplicate[51867:11303] 
2013-02-03 11:00:27.208 BridgeDuplicate[51867:11303] 
2013-02-03 11:00:27.208 BridgeDuplicate[51867:11303] 
2013-02-03 11:00:27.208 BridgeDuplicate[51867:11303] 
2013-02-03 11:00:27.208 BridgeDuplicate[51867:11303] 
2013-02-03 11:00:27.208 BridgeDuplicate[51867:11303] 
2013-02-03 11:00:27.208 BridgeDuplicate[51867:11303] e 
2013-02-03 11:00:27.209 BridgeDuplicate[51867:11303] 
2013-02-03 11:00:27.209 BridgeDuplicate[51867:11303] 
2013-02-03 11:00:27.209 BridgeDuplicate[51867:11303] 
2013-02-03 11:00:27.209 BridgeDuplicate[51867:11303] 
2013-02-03 11:00:27.209 BridgeDuplicate[51867:11303] 
2013-02-03 11:00:27.209 BridgeDuplicate[51867:11303] " 
2013-02-03 11:00:27.228 BridgeDuplicate[51867:11303] 
2013-02-03 11:00:27.228 BridgeDuplicate[51867:11303] o 
2013-02-03 11:00:27.228 BridgeDuplicate[51867:11303] 
2013-02-03 11:00:27.228 BridgeDuplicate[51867:11303] 
2013-02-03 11:00:27.228 BridgeDuplicate[51867:11303] e 
2013-02-03 11:00:27.228 BridgeDuplicate[51867:11303] 
2013-02-03 11:00:27.229 BridgeDuplicate[51867:11303] 
2013-02-03 11:00:27.229 BridgeDuplicate[51867:11303] 
2013-02-03 11:00:27.229 BridgeDuplicate[51867:11303] 
2013-02-03 11:00:27.229 BridgeDuplicate[51867:11303] 
2013-02-03 11:00:27.229 BridgeDuplicate[51867:11303] 
2013-02-03 11:00:27.230 BridgeDuplicate[51867:11303] e" 
2013-02-03 11:00:27.230 BridgeDuplicate[51867:11303] 
2013-02-03 11:00:27.230 BridgeDuplicate[51867:11303] 

回答

1

我懷疑你是誤解如何NSTextCheckingResult的作品,但也許更重要的是,也有一些問題你的格局。下面的代碼應該是說明問題:

NSString *string = @"[Board\t\"1\"]\n[Dealer\t\"N\"]\n"; 
NSLog(@"string length: %lu", (unsigned long)[string length]); 
NSError *error = nil; 
NSString *toMatch = @"\\[Board\\t\"([0-9]?)\"\\].*\\n\\[Dealer\\t\"([NEWS])\"\\].*"; 
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:toMatch options:0 error:&error]; 
NSLog(@"length: %lu", (unsigned long)[toMatch length]); 
NSUInteger numberOfMatches = [regex numberOfMatchesInString:string options:0 range:NSMakeRange(0, [string length])]; 
NSLog(@"number of matches: %lu", (unsigned long)numberOfMatches); 
for (NSTextCheckingResult* match in [regex matchesInString:string options:0 range:NSMakeRange(0, [string length])]) 
{ 
    NSLog(@"Number of ranges in match: %lu", match.numberOfRanges); 
    for (NSUInteger i = 0; i < match.numberOfRanges; ++i) 
    { 
     NSRange matchedRange = [match rangeAtIndex: i]; 
     NSString* tstring = [string substringWithRange: matchedRange]; 
     NSLog(@"range %lu string: %@", (unsigned long)i, tstring); 
    } 
} 

什麼,你會擺脫這就是:

2013-02-03 12:16:41.112 RegExTest[72290:303] string length: 25 
2013-02-03 12:16:43.889 RegExTest[72290:303] length: 49 
2013-02-03 12:16:43.889 RegExTest[72290:303] number of matches: 1 
2013-02-03 12:16:43.890 RegExTest[72290:303] Number of ranges in match: 3 
2013-02-03 12:16:43.890 RegExTest[72290:303] range 0 string: [Board "1"] 
[Dealer "N"] 
2013-02-03 12:16:43.890 RegExTest[72290:303] range 1 string: 1 
2013-02-03 12:16:43.890 RegExTest[72290:303] range 2 string: N 

知道這裏的事情是,你拍攝的一個比賽,並匹配有多個範圍。每個成功的匹配都至少有一個範圍:整個字符串的範圍與整個模式匹配(這不是您在此感興趣的內容)。基於括號的捕獲組將出現在0以外的索引處,如圖所示通過此代碼。

轉義規則是一種痛苦 - 在NSString中有轉義的規則,然後是轉義正則表達式的規則。它們如何相互作用可能並不明顯,但我在這裏提出的模式似乎是在做你所追求的。

編輯:

下面是另一個版本,直接從您的網址,拉和成功匹配:

NSError* error = nil; 
NSString* string = [NSString stringWithContentsOfURL: [NSURL URLWithString: @"http://www.atlantaduplicatebridgeclub.com/scorepost/2013/01/20130126ana.pbn"] 
              encoding: NSUTF8StringEncoding error: &error]; 
NSLog(@"string length: %lu", (unsigned long)[string length]); 
NSString *toMatch = @"\\[Board\\s*\"([0-9]?)\"\\].*\\[Dealer\\s*\"([NEWS])\"\\]"; 
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:toMatch options:NSRegularExpressionDotMatchesLineSeparators error:&error]; 
NSLog(@"pattern length: %lu", (unsigned long)[toMatch length]); 
NSUInteger numberOfMatches = [regex numberOfMatchesInString:string options:0 range:NSMakeRange(0, [string length])]; 
NSLog(@"number of matches: %lu", (unsigned long)numberOfMatches); 
for (NSTextCheckingResult* match in [regex matchesInString:string options:NSRegularExpressionDotMatchesLineSeparators range:NSMakeRange(0, [string length])]) 
{ 
    NSLog(@"Number of ranges in match: %lu", match.numberOfRanges); 
    for (NSUInteger i = 0; i < match.numberOfRanges; ++i) 
    { 
     NSRange matchedRange = [match rangeAtIndex: i]; 
     NSString* tstring = [string substringWithRange: matchedRange]; 
     NSLog(@"range %lu string: %@", (unsigned long)i, tstring); 
    } 
} 
+0

的問題是,當我進入你的'toMatch'字符串,我得到一個長度爲0的結果。所以也許你的字符串與我的略有不同,也許我的字符串與我想象的不一樣。我怎樣才能打印出我的字符串的前300個字符,如十六進制或其他代碼? – zerowords

+0

嘗試一個更「寬容」的表達式,例如:'NSString * toMatch = @「\\ [Board \\ s * \」([0-9]?)\「\\] \\ s * \\ [ Dealer \\ s * \「([NEWS])\」\\]「;'這將替換帶有泛型,貪婪空格匹配的製表符和換行符的顯式匹配。 – ipmcc

+0

不過,我得到0的長度。有一些像這樣的開始行:%PBN 2.1 %EXPORT %Content-type:text/x-pbn; charset = ISO-8859-1 – zerowords