我需要在大的NSString
(用於解析源代碼)中找到所有關鍵字,而且我當前的實現太慢,但我不確定如何改進它。在一個大的NSString中有效地找到許多關鍵字中的第一個
我使用NSRegularExpression
,這是基於它比我能寫的任何東西都更優化的假設,但是性能比我預期的要慢。有誰知道更快的方式來實現這個?
目標字符串將包含utf-8字符,但關鍵字本身將始終爲純字母數字ascii。我想這可以用來優化一些東西?
@implementation MyClass
// i'm storing the regular expression in a static variable, since it never changes and I need to re-use it often
static NSRegularExpression *keywordsExpression;
+ (void)initialize
{
[super initialize];
NSArray *keywords = [NSArray arrayWithObjects:@"accumsan", @"adipiscing", @"aliquam", @"aliquet", @"amet", @"ante", @"arcu", @"at", @"commodo", @"congue", @"consectetur", @"consequat", @"convallis", @"cras", @"curabitur", @"cursus", @"dapibus", @"diam", @"dolor", @"dui", @"elit", @"enim", @"erat", @"eros", @"est", @"et", @"eu", @"felis", @"fermentum", @"gravida", @"iaculis", @"id", @"imperdiet", @"integer", @"ipsum", @"lacinia", @"lectus", @"leo", nil];
NSString *pattern = [NSString stringWithFormat:@"\\b(%@)\\b", [keywords componentsJoinedByString:@"|"]; // \b(accumsan|adipiscing|aliquam|…)\b
keywordsExpression = [NSRegularExpression regularExpressionWithPattern:pattern] options:NSRegularExpressionCaseInsensitive error:NULL];
}
// this method will be called in quick succession, I need it to be a able to run tens
// of thousands of times per second. The target string is big (50KB or so), but the
// search range is short, rarely more than 30 characters
- (NSRange)findNextKeyword:(NSString *)string inRange:(NSRange)range
{
return [keywordsExpression rangeOfFirstMatchInString:string options:0 range:range];
}
@end
編輯按@ CodeBrickie的回答,我已經更新了我的代碼對整個字符串進行一次正則表達式搜索,並保存火柴緩存NSIndexSet
,則每次方法稱它搜索NSIndexSet
的關鍵字範圍,而不是搜索字符串。結果大約快一個數量級:
@implementation MyClass
static NSRegularExpression *keywordsExpression;
static NSIndexSet *keywordIndexes = nil;
+ (void)initialize
{
[super initialize];
NSArray *keywords = [NSArray arrayWithObjects:@"accumsan", @"adipiscing", @"aliquam", @"aliquet", @"amet", @"ante", @"arcu", @"at", @"commodo", @"congue", @"consectetur", @"consequat", @"convallis", @"cras", @"curabitur", @"cursus", @"dapibus", @"diam", @"dolor", @"dui", @"elit", @"enim", @"erat", @"eros", @"est", @"et", @"eu", @"felis", @"fermentum", @"gravida", @"iaculis", @"id", @"imperdiet", @"integer", @"ipsum", @"lacinia", @"lectus", @"leo", nil];
NSString *pattern = [NSString stringWithFormat:@"\\b(%@)\\b", [keywords componentsJoinedByString:@"|"]; // \b(accumsan|adipiscing|aliquam|…)\b
keywordsExpression = [NSRegularExpression regularExpressionWithPattern:pattern] options:NSRegularExpressionCaseInsensitive error:NULL];
}
- (void)prepareToFindKeywordsInString:(NSString *)string
{
NSMutableIndexSet *keywordIndexesMutable = [[NSIndexSet indexSet] mutableCopy];
[keywordsExpression enumerateMatchesInString:string options:0 range:NSMakeRange(0, string.length) usingBlock:^(NSTextCheckingResult *match, NSMatchingFlags flags, BOOL *stop){
[keywordIndexesMutable addIndexesInRange:match.range];
}];
keywordIndexes = [keywordIndexesMutable copy];
}
- (NSRange)findNextKeyword:(NSString *)string inRange:(NSRange)range
{
NSUInteger foundKeywordMax = (foundCharacterSetRange.location == NSNotFound) ? string.length : foundCharacterSetRange.location;
NSRange foundKeywordRange = NSMakeRange(NSNotFound, 0);
for (NSUInteger index = startingAt; index < foundKeywordMax; index++) {
if ([keywordIndexes containsIndex:index]) {
if (foundKeywordRange.location == NSNotFound) {
foundKeywordRange.location = index;
foundKeywordRange.length = 1;
} else {
foundKeywordRange.length++;
}
} else {
if (foundKeywordRange.location != NSNotFound) {
break;
}
}
}
return foundKeywordRange;
}
@end
這似乎工作得很好,並且性能已達到我想要的範圍。我想再等一會兒,看看在接受這個之前是否還有更多的建議。
只是一個側面說明 - 您的'findNextKeyword:inRange:'方法不應該接受* NSRange指針*。 –
謝謝@JacobRelkin,我剛剛編輯過。這不是我的實際代碼,它只是儀器告訴我的部分太慢(我CPU時間的95%) –