將一段文字讀入一個字符串向量

我試圖將一段文字讀入一個字符串向量，然後創建字典，記錄每個字的出現次數。到目前爲止，它只加載文本的第一個單詞，我不知道如何繼續。我知道我不清楚如何正確使用這些成員函數。將一段文字讀入一個字符串向量

int main() 
    { 
     ifstream input1; 
     input1.open("Base_text.txt"); 

    vector<string> base_file; 
    vector<int> base_count; 


    if (input1.fail()) 
    { 
     cout<<"Input file 1 opening failed."<<endl; 
     exit(1); 
    } 

    make_dictionary(input1, base_file, base_count); 


} 

void make_dictionary(istream& file, vector<string>& words, vector<int>& count) 
{ 


    string line; 


    while (file>>line) 
    { 
     words.push_back(line); 
    } 

    cout<<words[0]; 



}

預期輸出：

This is some simple base text to use for comparison with other files. 
You may use your own if you so choose; your program shouldn't actually care. 
For getting interesting results, longer passages of text may be useful. 
In theory, a full novel might work, although it will likely be somewhat slow.

實際輸出：

This

來源

2013-04-26 iamthewalrus

好了，你只打印第一個字：（這個想法IST告訴你爲什麼宥不得不愛STL）

cout<<words[0];

你可以

for(string& word : words)    cout<<word;

或

for(size_t i=0; i<words.size(); ++i) cout<<words[i];

要打印所有的然後。一個非常簡單的解決方案來算的話是到位矢量的使用map：

map<string,size_t> words; 
... 
string word; 
while (file>>word)   ++words[word]; 
... 
for(const auto& w : words) cout<<endl<<w.first<<":"<<w.second;

WhozCraig提出了挑戰。通過頻率指令字：

multimap<int,string,greater<int>> byFreq; 
for(const auto& w : words) byFreq.insert(make_pair(w.second, w.first)); 
for(const auto& w : byFreq) cout<<endl<<w.second <<":"<<w.first;

All will (ideone):

#include <iostream> 
#include <map> 
#include <functional> 
#include <utility> 
#include <cctype> 
using namespace std; 

int main() 
{ 
    map<string,size_t> words; 
    string word; 

    while (cin>>word) 
    { 
     for(char&c:word)c=tolower(c); 
     ++words[word]; 
    } 
    cout<<" ----- By word: ------" ; 
    for(const auto& w : words) cout<<endl<<w.first<<":"<<w.second; 
    cout<<endl<<endl<<" ----- By frequency: ------"; 
    multimap<size_t,string,greater<int>> byFreq; 
    for(const auto& w : words) byFreq.insert(make_pair(w.second, w.first)); 
    for(const auto& w : byFreq) cout<<endl<<w.second <<":"<<w.first; 
    return 0; 
}

來源

2013-04-26 19:42:36 qPCR4vir

任何想法，我將如何進行跟蹤出現的每個字的數量？ – iamthewalrus 2013-04-26 19:46:14

@AndyMiller，地圖，也許？ – chris 2013-04-26 19:46:51

@WhozCraig提出了一個挑戰。要按頻率排序： – qPCR4vir 2013-04-27 21:01:22

我想你必須移動cout << words[0]內環路，否則當循環結束它只被調用一次。不過，每次迭代只會打印第一個單詞。因此，打印硬道理每次：

while (file>>line) 
{ 
    words.push_back(line); 
    cout<<words.back(); // or cout << line, same thing really 
}

最後一件事 - while(file >> line)將字讀字，作爲變量的名字所暗示的不是逐行。如果你想要的話，請使用while (getline(file, line))。

來源

2013-04-26 19:43:41 jrok

關於如何繼續跟蹤每個單詞出現次數的任何想法？ – iamthewalrus 2013-04-26 19:50:36

將文本文件中的單詞內容讀入字符串向量是相當直接的。下面的代碼假設被解析的文件名是第一個命令行參數。

#include <iostream> 
#include <fstream> 
#include <iterator> 
#include <vector> 
#include <string> 
#include <map> 
using namespace std; 

int main(int argc, char *argv[]) 
{ 
    if (argc < 2) 
     return EXIT_FAILURE; 

    // open file and read all words into the vector. 
    ifstream inf(argv[1]); 
    istream_iterator<string> inf_it(inf), inf_eof; 
    vector<string> words(inf_it, inf_eof); 

    // for populating a word-count dictionary: 
    map<string, unsigned int> dict; 
    for (auto &it : words) 
     ++dict[it]; 

    // print the dictionary 
    for (auto &it : dict) 
     cout << it.first << ':' << it.second << endl; 

    return EXIT_SUCCESS; 
}

然而，你應該（可能）合併兩種操作爲一個循環，並完全避免中間載體：

#include <iostream> 
#include <fstream> 
#include <string> 
#include <map> 
using namespace std; 

int main(int argc, char *argv[]) 
{ 
    if (argc < 2) 
     return EXIT_FAILURE; 

    // open file and read all words into the vector. 
    ifstream inf(argv[1]); 
    map<string, unsigned int> dict; 
    string str; 
    while (inf >> str) 
     ++dict[str]; 

    // print the dictionary 
    for (auto &it : dict) 
     cout << it.first << ':' << it.second << endl; 

    return EXIT_SUCCESS; 
}

在最高排序它最低的發生是不是很瑣碎，但可行與分類牀矢量和std::sort()。此外，條帶化前導和尾隨非字母字符（標點符號）也是一種增強。另一種方法是在插入地圖之前將這些詞縮小爲全部小寫。這允許球和球佔用計數爲2的單個字典插槽。

來源

2013-04-26 21:03:37 WhozCraig

我有以下實現，它試圖將單詞轉換爲小寫和刪除標點符號。

#include<iostream> 
#include<iterator> 
#include<algorithm> 
#include<fstream> 
#include<string> 
#include<unordered_map> 

int main() { 
    std::vector<std::string> words; 
    { 
    std::ifstream fp("file.txt", std::ios::in); 
    std::copy(std::istream_iterator<std::string>(fp), 
       std::istream_iterator<std::string>(), 
       std::back_insert_iterator<std::vector<std::string>>(words)); 
    } 

    std::unordered_map<std::string, int> frequency; 
    for(auto it=words.begin(); it!=words.end(); ++it) { 
    std::string word; 
    std::copy_if(it->begin(), it->end(), 
       std::back_insert_iterator<std::string>(word), ::isalpha); 
    std::transform(word.begin(), word.end(), word.begin(), ::tolower); 
    frequency[word]++; 
    } 

    for(auto p:frequency) { 
    std::cout<<p.first<<" => "<<p.second<<std::endl; 
    } 
    return 0; 
}

如果file.txt有以下內容：

hello hello hello bye BYE dog DOG' dog. 

word Word worD w'ord

該方案將產生：

word => 4 
dog => 3 
bye => 2 
hello => 3

來源

2013-04-26 22:16:23 Escualo

將一段文字讀入一個字符串向量

回答

相關問題