2013-06-01 116 views
1

給定一個包含多個字符串的大文本文件,讀取文本文件並計算C++中每個單詞出現次數的最有效方法是什麼?文本文件的大小是未知的,所以我不能只使用一個簡單的數組。另外還有一個問題。此文本文件的每一行都以類別關鍵詞開頭,以下文字是該類別的功能。我需要能夠統計每個單詞在該類別中的出現次數。計算文本文件中每個單詞的出現

例如:

colors red blue green yellow orange purple 
sky blue high clouds air empty vast big 
ocean wet water aquatic blue 
colors brown black blue white blue blue 

有了這個例子,我需要統計的「顏色」類別中,有4次出現的「藍色」,即使有6個共發生總共藍色。

+3

請出示你到目前爲止做了什麼的例子。 – chux

+0

像@chux說的,給我們一些代碼來處理。 –

+0

我不同意關閉此問題的決定。這個問題的意圖似乎很清楚,從第一段的最後兩句開始。以下OP的例子和最後一句話完全足夠。最後,我相信我在[我的回答](http://stackoverflow.com/a/16868853/1497596)中提供的示例輸出與OP期望的匹配。他人提供的答案也向我表明了對問題核心的理解。 – DavidRR

回答

1

將單詞標記並將它們存儲爲鍵值對。

更新:我意識到我誤解了這個問題。 Fillowing代碼應該按類別分離,並計數:

#include <iostream> 
#include <string> 
#include <map> 
#include <fstream> 
using namespace std; 
int main() 
{ 
    ifstream file; 
    file.open("path\\to\\text\\file"); 
    if(!file.is_open()) return 1; 
    map<string, map<string, int> > categories; 
    while(file.good()) 
    { 
     string s; 
     getline(file, s); 
     int pos = s.find_first_of(' '); 
     if(pos < 0) continue; 
     string word = s.substr(0, pos); 
     string category = word; 
     s = s.erase(0, pos+1); 
     while(s.size() > 0) 
     { 
      pos = s.find_first_of(' '); 
      if(pos < 0) 
       pos = s.size(); 
      string word = s.substr(0, pos); 
      if(word != "") 
       categories[category][word]++; 
      s = s.erase(0, pos+1); 
     } 
    } 
    for(map<string, map<string, int> >::iterator cit = categories.begin(); cit != categories.end(); ++cit) 
    { 
     cout << "Category - " << cit->first << endl; 
     for(map<string, int>::iterator wit = cit->second.begin(); wit != cit->second.end(); ++wit) 
      cout << "\tword: " << wit->first << ",\t" << wit->second << endl; 
    } 
    return 0; 
} 
3

我會用一個用於讀取和分離字(它通過查找空白分離字),並將它們保存到(標準C++方法是使用std::map)。

這裏是一個C++記錄代碼:

#include <iostream> 
#include <map> // A map will be used to count the words. 
#include <fstream> // Will be used to read from a file. 
#include <string> // The map's key value. 
using namespace std; 


//Will be used to print the map later. 
template <class KTy, class Ty> 
void PrintMap(map<KTy, Ty> map) 
{ 
    typedef std::map<KTy, Ty>::iterator iterator; 
    for (iterator p = map.begin(); p != map.end(); p++) 
     cout << p->first << ": " << p->second << endl; 
} 

int main(void) 
{ 
    static const char* fileName = "C:\\MyFile.txt"; 

    // Will store the word and count. 
    map<string, unsigned int> wordsCount; 


    { 
     // Begin reading from file: 
     ifstream fileStream(fileName); 

     // Check if we've opened the file (as we should have). 
     if (fileStream.is_open()) 
      while (fileStream.good()) 
      { 
       // Store the next word in the file in a local variable. 
       string word; 
       fileStream >> word; 

       //Look if it's already there. 
       if (wordsCount.find(word) == wordsCount.end()) // Then we've encountered the word for a first time. 
        wordsCount[word] = 1; // Initialize it to 1. 
       else // Then we've already seen it before.. 
        wordsCount[word]++; // Just increment it. 
      } 
     else // We couldn't open the file. Report the error in the error stream. 
     { 
      cerr << "Couldn't open the file." << endl; 
      return EXIT_FAILURE; 
     } 

     // Print the words map. 
     PrintMap(wordsCount); 
    } 

    return EXIT_SUCCESS; 
} 

輸出:

空氣:1
水產:1
大:1
黑:1
藍色:6
棕色:1
雲層:1
種 顏色:2
空:1
綠色:1
高:1
海洋:1
橙:1
紫:1
紅:1
天空:1
廣闊:1
水:1
溼:1
白色:1
黃色:1

1

這是一個解決方案,可以實現您陳述的目標。 See it live here

它利用std::map保持一個(類別,字)對發生的次數的計數。

std::istringstream用於將數據首先分解成行,然後分解成單詞。


OUTPUT:

(colors, black) => 1 
(colors, blue) => 4 
(colors, brown) => 1 
(colors, green) => 1 
(colors, orange) => 1 
(colors, purple) => 1 
(colors, red) => 1 
(colors, white) => 1 
(colors, yellow) => 1 
(ocean, aquatic) => 1 
(ocean, blue) => 1 
(ocean, water) => 1 
(ocean, wet) => 1 
(sky, air) => 1 
(sky, big) => 1 
(sky, blue) => 1 
(sky, clouds) => 1 
(sky, empty) => 1 
(sky, high) => 1 
(sky, vast) => 1 

方案:

#include <iostream> // std::cout, std::endl 
#include <map>  // std::map 
#include <sstream> // std::istringstream 
#include <utility> // std::pair 

int main() 
{ 
    // The data. 
    std::string content = 
     "colors red blue green yellow orange purple\n" 
     "sky blue high clouds air empty vast big\n" 
     "ocean wet water aquatic blue\n" 
     "colors brown black blue white blue blue\n"; 

    // Load the data into an in-memory table. 
    std::istringstream table(content); 

    std::string row; 
    std::string category; 
    std::string word; 
    const char delim = ' '; 
    std::map<pair<std::string, std::string>, long> category_map; 
    std::pair<std::string, std::string> cw_pair; 
    long count; 

    // Read each row from the in-memory table. 
    while (!table.eof()) 
    { 
     // Get a row of data. 
     getline(table, row); 

     // Allow the row to be read word-by-word. 
     std::istringstream words(row); 

     // Get the first word in the row; it is the category. 
     getline(words, category, delim); 

     // Get the remaining words in the row. 
     while (std::getline(words, word, delim)) { 
      cw_pair = std::make_pair(category, word); 

      // Maintain a count of each time a (category, word) pair occurs. 
      if (category_map.count(cw_pair) > 0) { 
       category_map[cw_pair] += 1; 
      } else { 
       category_map[cw_pair] = 1; 
      } 
     } 
    } 

    // Print out each unique (category, word) pair and 
    // the number of times that it occurs. 
    std::map<pair<std::string, std::string>, long>::iterator it; 

    for (it = category_map.begin(); it != category_map.end(); ++it) { 
     cw_pair = it->first; 
     category = cw_pair.first; 
     word = cw_pair.second; 
     count = it->second; 

     std::cout << "(" << category << ", " << word << ") => " 
      << count << std::endl; 
    } 
} 
相關問題