C++如何讓此代碼更高效？

我有一個單詞的數組，我有一個文本文件。我想要做的就是使用單詞數組搜索文本文件，計算數組中每個單詞出現在文本文件中的次數。C++如何讓此代碼更高效？

我曾想過使用for循環，但只是給我的總字數不是個人字數每個。我無法將文本文件放入數組中，因爲文本文件中有大約40000個字。

在計數之後，我想要將每個計數除以一個稱爲「比例」的整數值。然後用新的計數編號重新組合一個字符串。

所以我現在正在做它，如下所示。無論如何，我可以讓這個更有效率嗎？

任何幫助是極大的讚賞。

單詞數組=單詞測試。

文件名稱= testF。

inWord =文件中的每個單詞。

while(testF >> inWord) 
    {if (inWord == testwords[0]){ 
      count1++; 
      } 
     if (inWord == testwords[1]){ 
      count2++; 
      } 
     if (inWord == testwords[2]){ 
      count3++; 
      } 
     if (inWord == testwords[3]){ 
      count4++; 
      } 
     if (inWord == testwords[4]){ 
      count5++; 
      } 
     if (inWord == testwords[5]){ 
      count6++; 
      } 
     if (inWord == testwords[6]){ 
      count7++; 
      } 
     if (inWord == testwords[7]){ 
      count8++; 
      } 
} 
cout << testwords[0] << " " << count1 << " " << s1.append(count1/scale, '*') << endl; 
cout << testwords[1] << " " << count2 << " " << s2.append(count2/scale, '*') << endl; 
cout << testwords[2] << " " << count3 << " " << s3.append(count3/scale, '*') << endl; 
cout << testwords[3] << " " << count4 << " " << s4.append(count4/scale, '*') << endl; 
cout << testwords[4] << " " << count5 << " " << s5.append(count5/scale, '*') << endl; 
cout << testwords[5] << " " << count6 << " " << s6.append(count6/scale, '*') << endl; 
cout << testwords[6] << " " << count7 << " " << s7.append(count7/scale, '*') << endl; 
cout << testwords[7] << " " << count8 << " " << s8.append(count8/scale, '*') << endl;

來源

2012-11-17 James Warner

強制性使用探查器評論:) – EvilTeach

在擔心效率之前，您應該擔心方法。你沒有使用邏輯數據結構。而不是有8個獨立的計數，保持一系列的計數。或者更好的是，保留一個字 - >數的地圖。

幸運的是，在這種情況下，更乾淨的代碼將對應更快的執行。

特別是，使用std::map<std::string, size_t>。

或者，如果您正在使用C++ 11，你可以用std :: unordered_map對可能更好的性能。

假設你從cin讀你的話：

std::map<std::string, size_t> counts; 

std::string word; 

while (std::cin >> word) { 
    ++counts[word]; 
} 

for (std::map<std::string, size_t::const_iterator it = counts.begin(), 
    end = counts.end(); it != end; ++it) { 
    std::cout << "The word '" << it->first << " appeared " 
       << it->second << " times" << std::endl; 
}

Documentation性病::地圖。

Documentation性病:: unordered_map。

對於它的價值，std :: unordered_map是（很可能總是）實現爲hash map，std :: map使用平衡二叉樹作爲後臺結構來實現（很可能總是）。

來源

2012-11-17 12:21:10 Corbin

設置一個std::map<std::string, unsigned long long>，掃描通過字的文件字，並且將每一個字計數：

std::map<std::string, unsigned long long> wordMap; 

std::string word; // read words into this string 
... 
wordMap[word]++; // increase counter each time a word is found. First call will insert 0.

然後你可以遍歷你的話的數組，檢查的條目圖：

for (unsigned int i = 0; i < nWords; ++i) 
{ 
    std::cout << "Word " << testWords[i] << " was found " << wordMap[testWords[i]] << " times\n"; 
}

每一個新詞被發現時，myMap[word]將插入一個鍵值對word : 0。

如果你有C++ 11，你可以用std::unordered_map嘗試，並挑選表現最好的一個。

來源

2012-11-17 12:21:04 juanchopanza

只有8個值進行比較，您可能最有可能找到比std更好的散列算法。它可能只由前兩個字符，或者最後一個字符或字符串lenght的：

while (std::cin >> word) { 
    int i=my_hash(word); 
    if (word==my_sparse_hash_table[i].word) my_sparse_hash_table[i].count++; 
}

只需用你的方法：

while (std::cin >> word) { 
    for (int i=0;i<N;i++) 
    if (word == myTable[i].word) { myTable[i].count++; break; } 
} // earlies break out of the loop

微優化包括走向開始一個找到的條目數組myTable。

來源

2012-11-17 12:34:58

這裏的所有其他答案都是非常好的建議。你可以做的一個小的優化是在現有代碼中使用或其他。

if (inWord == testwords[0]) 
{ 
    count1++; 
} 
if (inWord == testwords[1]) 
{ 
    count2++; 
}

可以通過

if (inWord == testwords[0]) 
{ 
    count1++; 
} 
else if (inWord == testwords[1]) 
{ 
    count2++; 
}

的概念是被替換的是，如果inWord確實匹配元素0，這是不可能，以匹配任何其它元件。

無論如何Profilers是你的朋友。

來源

2012-11-17 13:09:05 EvilTeach

C++如何讓此代碼更高效？

回答

相關問題