iOS iPhone如何按使用頻率列出UTextView中的所有關鍵字？

我得到一個UITextView與任意長度的文本（最多10000個字符）。我需要解析這些文本，提取所有關鍵字，並按使用頻率列出它們，最常用的單詞在最上面，下一個單詞等等。我將在操作完成後提供一個模態UITableView。iOS iPhone如何按使用頻率列出UTextView中的所有關鍵字？

我正在考慮高效且有用的方式來做到這一點。我可以嘗試使用[空格，標點符號等]形式的分隔符來分隔字符串。這給我一個字符序列的數組。我可以將每個添加序列添加爲NSMutableDictionary鍵，並在我看到該單詞的另一個實例時遞增其計數。但是，這可能會導致300-400字的列表，大多數頻率爲1.

是否有一種很好的方法來實現我描述的邏輯？我應該嘗試按字母順序對數組進行排序並嘗試某種「模糊」邏輯匹配嗎？ 有沒有可以爲我做這種工作的任何NSDataDetector或NSString方法？

另一個問題是：我將如何提取像a，at，to等等的東西，而不在我的關鍵字列表中列出它們？

如果我可以看看已完成此任務的示例項目，那將是非常好的。

謝謝！

來源

2012-04-22 Alex Stone

我不清楚的東西。你想按頻率列出所有的關鍵詞，但按他們的頻率排列的300-400字的光照不好，因爲它們大多數出現一次？ – shein 2012-04-22 14:23:42

我結束了去與CFStringTokenizer。我不確定下面的橋接模型是否正確，但它似乎工作

-(void)listAllKeywordsInString:(NSString*)text 
    { 
     if(text!=nil) 
     { 
      NSMutableDictionary* keywordsDictionary = [[NSMutableDictionary alloc] initWithCapacity:1024]; 
      NSString* key = nil; 
      NSLog(@"%@",text); 

      NSLog(@"Started parsing: %@",[[NSDate date] description]); 

      CFStringRef string =(__bridge CFStringRef)text; // Get string from somewhere 

     CFStringTokenizerRef tokenizer = CFStringTokenizerCreate(kCFAllocatorDefault, (__bridge_retained CFStringRef) text, CFRangeMake (0,CFStringGetLength((__bridge_retained CFStringRef)text)), kCFStringTokenizerUnitWord, CFLocaleCopyCurrent()); 

      unsigned tokensFound = 0; // or the desired number of tokens 

      CFStringTokenizerTokenType tokenType = kCFStringTokenizerTokenNone; 

      while(kCFStringTokenizerTokenNone != (tokenType = CFStringTokenizerAdvanceToNextToken(tokenizer))) { 
       CFRange tokenRange = CFStringTokenizerGetCurrentTokenRange(tokenizer); 
       CFStringRef tokenValue = CFStringCreateWithSubstring(kCFAllocatorDefault, string, tokenRange); 

       // This is the found word 
       key =(__bridge NSString*)tokenValue; 

       //increment its count 
       NSNumber* count = [keywordsDictionary objectForKey:key]; 
       if(count!=nil) 
       { 
        [keywordsDictionary setValue:[NSNumber numberWithInt:1] forKey:key]; 
       }else { 
        [keywordsDictionary setValue:[NSNumber numberWithInt:count.intValue+1] forKey:key]; 
       } 



       CFRelease(tokenValue); 

       ++tokensFound; 
      } 
      NSLog(@"Ended parsing. tokens Found: %d, %@",tokensFound,[[NSDate date] description]); 
      NSLog(@"%@",[keywordsDictionary description]); 
      // Clean up 
      CFRelease(tokenizer); 

     } 


    }

來源

2012-04-23 11:41:05

您可以使用CFStringTokenizer來獲取單詞邊界。按照您的建議，您可以使用NSMutableDictionary或NSCountedSet進行計數，這可能會稍微有效一些。

如果您對頻率爲1（或其他閾值）的單詞不感興趣，則必須在對所有單詞進行計數之後將它們過濾掉。

對於忽略某些單詞（a，the，for ...），您需要一個特定於您的文本語言的單詞列表。 Wikipedia article on stop words包含一對鏈接，例如， this CSV file。

來源

2012-04-22 14:54:24 omz

有很多方法可以做到這一點。

您應該將所有關鍵字添加到數組（或其他集合對象）中，並引用它/遍歷它，以便您搜索這些關鍵字，並且只搜索這些關鍵字（並且避免檢查出現的，於，等等）

NSArray *keywords = [ add your keywords ]; 

NSString *textToSearchThrough = @" your text "; // or load your text File here 

- loop control statement here (like maybe fast enumerate), and inside this loop: 
NSRange range = [textToCheckThrough rangeOfString:keywords[currentKeyword] 
           options:NSCaseInsensitiveSearch]; 
if(range.location != NSNotFound) { 
    // meaning, you did find it 
    // add it to a resultsArray, add 1 to this keyword's occurrenceCounter (which you must also declare and keep track of) 
    // etc. 
}

然後你遍歷你的結果數組，檢查每個關鍵字的出現次數，清除那些誰的發生計數< minOccurrenceCount，並從最高到最低排序剩餘。

來源

2012-04-22 15:11:42 sirab333

iOS iPhone如何按使用頻率列出UTextView中的所有關鍵字？

回答

相關問題