NSRegularExpression剝離HTML標記

我正在開發一個電子書閱讀器應用程序。我有整個電子書的.ePUB文件，其中在電子書的每個主題是一個HTML文件。我想在應用程序中實現搜索功能。我正在使用NSRegularExpression類進行搜索。請考慮以下html代碼：NSRegularExpression剝離HTML標記

<temp> I am temp in tempo with temptation </temp>

例如在上面的html代碼中，我只想搜索temp這個詞。現在在上面的代碼中temp出現5次 - ><temp> </temp> temp tempo誘惑。我正在尋找一個正則表達式，我只能提取整個單詞「temp」。我不想在html標記<temp> </temp>中考慮temp這個詞。我也不想要考慮節奏和誘惑這個詞。

在此先感謝

來源

2011-02-09 Prazi

這是怎麼回事？

[^<\/?\w*>]+(temp\s)

http://rubular.com/r/3PkdvNZSbr

NSString *evaluate_string = @"<temp> I am temp in tempo with temptation </temp>"; 
NSString *word = @"temp"; 
NSError *outError; 
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:[NSString stringWithFormat:@"[^<\\/?\\w*>]+(%@\\s)", word] options:0 error:&outError]; 

NSTextCheckingResult *result = [regex firstMatchInString:evaluate_string options:0 range:NSMakeRange(0, [evaluate_string length])]; 

if(result) { 
    NSLog(@"Found"); 
}

來源

2011-02-09 07:42:49

感謝雅各布。像魅力一樣工作。只是想知道......請你簡單解釋一下上面的正則表達式的工作。 – Prazi 2011-02-09 09:03:22

這個怎麼樣的小狗：

</?[a-z][a-z0-9]*[^<>]*>

，我發現它的使用RegexBuddy圖書館:)在

來源

2011-02-09 07:22:42

NSRegularExpression剝離HTML標記

回答

相關問題