計算大文檔中的每個字的出現次數

我想知道如何通過使用哪種數據結構來解決這個問題..任何人都可以詳細解釋這個... !!我正在考慮使用樹。計算大文檔中的每個字的出現次數

有一個大文件。其中包含數百萬字。那麼如何以最佳方式計算每個單詞出現次數？

在Microsoft提出了此問題...任何建議將不勝感激。

來源

2011-09-25 ferhan

使用字典或哈希集合將導致O（N）平均。

爲了解決它在O（N）最壞情況，一個trie具有小變化應使用：添加一個計數器在每個線索字表示;每次插入的單詞已經存在時，增加其計數器。

如果您想在最後打印所有金額，可以將計數器保留在不同的列表中，並從trie中引用它，而不是將計數器存儲在trie中。

來源

2011-09-25 15:09:51 dahunter

輪胎考慮它的唯一字。 – Jack

我只是使用散列映射（或字典，因爲這是Microsoft;））的字符串整數。對於輸入的每個單詞，如果它是新的，則將其添加到字典中，否則將其計數增加。 O（n）在輸入的長度上，假設哈希映射的實現是不錯的。

來源

2011-09-25 07:50:13 sethobrien

class IntValue 
{ 
    public IntValue(int value) 
    { 
     Value = value; 
    } 
    public int Value; 
} 

static void Main(string[] args) 
{ 
    //assuming document is a enumerator for the word in the document: 

    Dictionary<string, IntValue> dict = new Dictionary<string, IntValue>(); 
    foreach (string word in document) 
    { 
     IntValue intValue; 
     if(!dict.TryGetValue(word, out intValue)) 
     { 
      intValue = new IntValue(0); 
      dict.Add(word, intValue); 
     } 

     ++intValue.Value; 
    } 

    //now dict contains the counts 
}

來源

2011-09-25 11:10:34

樹不會在這裏工作。

Hashtable ht = new Hashtable(); 
// Read each word in the text in its order, for each of them: 
if (ht.contains(oneWord)) 
{ 
    Integer I = (Integer) ht.get(oneWord)); 
    ht.put(oneWord, new Integer(I.intValue()+1)); 
} 
else 
{ 
    ht.put(oneWord, new Integer(1)); 
}

來源

2011-09-25 11:34:36

爲什麼樹不起作用？ – svick

計算大文檔中的每個字的出現次數

回答

相關問題