計算文本文件中每個單詞的出現

給定一個包含多個字符串的大文本文件，讀取文本文件並計算C++中每個單詞出現次數的最有效方法是什麼？文本文件的大小是未知的，所以我不能只使用一個簡單的數組。另外還有一個問題。此文本文件的每一行都以類別關鍵詞開頭，以下文字是該類別的功能。我需要能夠統計每個單詞在該類別中的出現次數。計算文本文件中每個單詞的出現

例如：

colors red blue green yellow orange purple 
sky blue high clouds air empty vast big 
ocean wet water aquatic blue 
colors brown black blue white blue blue

有了這個例子，我需要統計的「顏色」類別中，有4次出現的「藍色」，即使有6個共發生總共藍色。

來源

2013-06-01 user2374842

請出示你到目前爲止做了什麼的例子。 – chux

像@chux說的，給我們一些代碼來處理。 –

我不同意關閉此問題的決定。這個問題的意圖似乎很清楚，從第一段的最後兩句開始。以下OP的例子和最後一句話完全足夠。最後，我相信我在[我的回答]（http://stackoverflow.com/a/16868853/1497596）中提供的示例輸出與OP期望的匹配。他人提供的答案也向我表明了對問題核心的理解。 – DavidRR

將單詞標記並將它們存儲爲鍵值對。

更新：我意識到我誤解了這個問題。 Fillowing代碼應該按類別分離，並計數：

#include <iostream> 
#include <string> 
#include <map> 
#include <fstream> 
using namespace std; 
int main() 
{ 
    ifstream file; 
    file.open("path\\to\\text\\file"); 
    if(!file.is_open()) return 1; 
    map<string, map<string, int> > categories; 
    while(file.good()) 
    { 
     string s; 
     getline(file, s); 
     int pos = s.find_first_of(' '); 
     if(pos < 0) continue; 
     string word = s.substr(0, pos); 
     string category = word; 
     s = s.erase(0, pos+1); 
     while(s.size() > 0) 
     { 
      pos = s.find_first_of(' '); 
      if(pos < 0) 
       pos = s.size(); 
      string word = s.substr(0, pos); 
      if(word != "") 
       categories[category][word]++; 
      s = s.erase(0, pos+1); 
     } 
    } 
    for(map<string, map<string, int> >::iterator cit = categories.begin(); cit != categories.end(); ++cit) 
    { 
     cout << "Category - " << cit->first << endl; 
     for(map<string, int>::iterator wit = cit->second.begin(); wit != cit->second.end(); ++wit) 
      cout << "\tword: " << wit->first << ",\t" << wit->second << endl; 
    } 
    return 0; 
}

來源

2013-06-01 01:30:04 Filip

我會用一個stream用於讀取和分離字（它通過查找空白分離字），並將它們保存到dictionary（標準C++方法是使用std::map）。

這裏是一個C++記錄代碼：

#include <iostream> 
#include <map> // A map will be used to count the words. 
#include <fstream> // Will be used to read from a file. 
#include <string> // The map's key value. 
using namespace std; 


//Will be used to print the map later. 
template <class KTy, class Ty> 
void PrintMap(map<KTy, Ty> map) 
{ 
    typedef std::map<KTy, Ty>::iterator iterator; 
    for (iterator p = map.begin(); p != map.end(); p++) 
     cout << p->first << ": " << p->second << endl; 
} 

int main(void) 
{ 
    static const char* fileName = "C:\\MyFile.txt"; 

    // Will store the word and count. 
    map<string, unsigned int> wordsCount; 


    { 
     // Begin reading from file: 
     ifstream fileStream(fileName); 

     // Check if we've opened the file (as we should have). 
     if (fileStream.is_open()) 
      while (fileStream.good()) 
      { 
       // Store the next word in the file in a local variable. 
       string word; 
       fileStream >> word; 

       //Look if it's already there. 
       if (wordsCount.find(word) == wordsCount.end()) // Then we've encountered the word for a first time. 
        wordsCount[word] = 1; // Initialize it to 1. 
       else // Then we've already seen it before.. 
        wordsCount[word]++; // Just increment it. 
      } 
     else // We couldn't open the file. Report the error in the error stream. 
     { 
      cerr << "Couldn't open the file." << endl; 
      return EXIT_FAILURE; 
     } 

     // Print the words map. 
     PrintMap(wordsCount); 
    } 

    return EXIT_SUCCESS; 
}

輸出：

空氣：1
水產：1
大：1
黑：1
藍色：6
棕色：1
雲層：1
種顏色：2
空：1
綠色：1
高：1
海洋：1
橙：1
紫：1
紅：1
天空：1
廣闊：1
水：1
溼：1
白色：1
黃色：1

來源

2013-06-01 02:58:45 MasterMastic

這是一個解決方案，可以實現您陳述的目標。 See it live here。

它利用std::map保持一個（類別，字）對發生的次數的計數。

std::istringstream用於將數據首先分解成行，然後分解成單詞。

OUTPUT：

(colors, black) => 1 
(colors, blue) => 4 
(colors, brown) => 1 
(colors, green) => 1 
(colors, orange) => 1 
(colors, purple) => 1 
(colors, red) => 1 
(colors, white) => 1 
(colors, yellow) => 1 
(ocean, aquatic) => 1 
(ocean, blue) => 1 
(ocean, water) => 1 
(ocean, wet) => 1 
(sky, air) => 1 
(sky, big) => 1 
(sky, blue) => 1 
(sky, clouds) => 1 
(sky, empty) => 1 
(sky, high) => 1 
(sky, vast) => 1

方案：

#include <iostream> // std::cout, std::endl 
#include <map>  // std::map 
#include <sstream> // std::istringstream 
#include <utility> // std::pair 

int main() 
{ 
    // The data. 
    std::string content = 
     "colors red blue green yellow orange purple\n" 
     "sky blue high clouds air empty vast big\n" 
     "ocean wet water aquatic blue\n" 
     "colors brown black blue white blue blue\n"; 

    // Load the data into an in-memory table. 
    std::istringstream table(content); 

    std::string row; 
    std::string category; 
    std::string word; 
    const char delim = ' '; 
    std::map<pair<std::string, std::string>, long> category_map; 
    std::pair<std::string, std::string> cw_pair; 
    long count; 

    // Read each row from the in-memory table. 
    while (!table.eof()) 
    { 
     // Get a row of data. 
     getline(table, row); 

     // Allow the row to be read word-by-word. 
     std::istringstream words(row); 

     // Get the first word in the row; it is the category. 
     getline(words, category, delim); 

     // Get the remaining words in the row. 
     while (std::getline(words, word, delim)) { 
      cw_pair = std::make_pair(category, word); 

      // Maintain a count of each time a (category, word) pair occurs. 
      if (category_map.count(cw_pair) > 0) { 
       category_map[cw_pair] += 1; 
      } else { 
       category_map[cw_pair] = 1; 
      } 
     } 
    } 

    // Print out each unique (category, word) pair and 
    // the number of times that it occurs. 
    std::map<pair<std::string, std::string>, long>::iterator it; 

    for (it = category_map.begin(); it != category_map.end(); ++it) { 
     cw_pair = it->first; 
     category = cw_pair.first; 
     word = cw_pair.second; 
     count = it->second; 

     std::cout << "(" << category << ", " << word << ") => " 
      << count << std::endl; 
    } 
}

來源

2013-06-01 03:48:13 DavidRR

計算文本文件中每個單詞的出現

回答

相關問題