計算文件中字母的出現次數

我正在計算每個字母出現在文件中的次數。當我運行下面的代碼時，它會計算兩次「Z」。誰能解釋爲什麼？計算文件中字母的出現次數

測試數據是：

ABCDEFGHIJKLMNOPQRSTUVWXYZ

#include <iostream>     //Required if your program does any I/O 
#include <iomanip>     //Required for output formatting 
#include <fstream>     //Required for file I/O 
#include <string>     //Required if your program uses C++ strings 
#include <cmath>     //Required for complex math functions 
#include <cctype>     //Required for letter case conversion 

using namespace std;    //Required for ANSI C++ 1998 standard. 

int main() 
{ 
string reply; 
string inputFileName; 
ifstream inputFile; 
char character; 
int letterCount[127] = {}; 

cout << "Input file name: "; 
getline(cin, inputFileName); 

// Open the input file. 
inputFile.open(inputFileName.c_str());  // Need .c_str() to convert a C++ string to a C-style string 
// Check the file opened successfully. 
if (! inputFile.is_open()) 
{ 
    cout << "Unable to open input file." << endl; 
    cout << "Press enter to continue..."; 
    getline(cin, reply); 
    exit(1); 
} 

while (inputFile.peek() != EOF) 
{ 
     inputFile >> character; 
     //toupper(character); 

     letterCount[static_cast<int>(character)]++; 
} 

for (int iteration = 0; iteration <= 127; iteration++) 
{ 
    if (letterCount[iteration] > 0) 
    { 
     cout << static_cast<char>(iteration) << " " << letterCount[iteration] << endl; 
    } 
} 

system("pause"); 
exit(0); 
}

來源

2011-03-05 cyotee

聞起來像作業......不知道C但在Java中，我使用ASCII等價物來計算一個字母的出現次數...... – 2011-03-05 20:08:47

ABCDEFGHIJKLMNOPQRSTUVQXYZ包含兩個「Q」，這就是計數「Q」兩次的解釋。 – 2011-03-05 20:09:16

您是否在閱讀時打印了每個字符？（在你的'while（inputFile.peek（）！= EOF）'循環中）。你嘗試過什麼調試？ – Crisfole 2011-03-05 20:09:53

正如其他人指出的那樣，您在輸入中有兩個Q.你有兩個原因ZS是最後

inputFile >> character;

（可能是當有隻是一個換行符離開流，因此不會EOF）未能任何轉換，在全球「字符」留下「Z」從以前的迭代。嘗試檢查inputFile.fail（）事後看到這一點：

while (inputFile.peek() != EOF) 
{ 
    inputFile >> character; 

    if (!inputFile.fail()) 
    { 
     letterCount[static_cast<int>(character)]++; 
    } 
}

慣用的方式來寫循環，並且還修復你的「Z」的問題，就是：

while (inputFile >> character) 
{ 
     letterCount[static_cast<int>(character)]++; 
}

來源

2011-03-05 20:16:01 fizzer

非常感謝，解決了我的問題。並幫助我瞭解發生了什麼。 – cyotee 2011-03-06 01:24:04

有兩種Q「在你的大寫字符串。我相信你得到兩個計數爲Z的原因是你應該在閱讀字符後檢查EOF，而不是之前，但我不確定這一點。

來源

2011-03-05 20:09:52

謝謝你接受我的監督，我已經更正了我的測試數據，現在只有「Z」計算了兩次。我通過更改爲do循環測試了您的EOF reccomendation，並且我得到了相同的錯誤。 – cyotee 2011-03-05 20:12:24

首先，輸入中確實有兩個Q.

關於Z，@Jeremiah可能是正確的，因爲它是最後一個字符，並且你的代碼沒有正確地檢測到EOF，所以它被加倍計數。這可以通過例如改變輸入字符的順序。

作爲一個側面說明，這裏

for (int iteration = 0; iteration <= 127; iteration++)

索引超出界限的;循環條件應爲iteration < 127，或者您的數組聲明爲int letterCount[128]。

來源

2011-03-05 20:10:05

更正後，現在只計算兩次「Z」。 – cyotee 2011-03-05 20:13:51

謝謝你指出我在數組中的錯誤，解釋了爲什麼我也得到了EOF計數。 – cyotee 2011-03-06 01:26:43

好了，別人已經已經指出了你的代碼中的錯誤。

但這裏是一個優雅的方式，你可以閱讀文件和指望它的字母：該解決方案的

struct letter_only: std::ctype<char> 
{ 
    letter_only(): std::ctype<char>(get_table()) {} 

    static std::ctype_base::mask const* get_table() 
    { 
     static std::vector<std::ctype_base::mask> 
      rc(std::ctype<char>::table_size,std::ctype_base::space); 

     std::fill(&rc['A'], &rc['z'+1], std::ctype_base::alpha); 
     return &rc[0]; 
    } 
}; 

struct Counter 
{ 
    std::map<char, int> letterCount; 
    void operator()(char item) 
    { 
     if (item != std::ctype_base::space) 
     ++letterCount[tolower(item)]; //remove tolower if you want case-sensitive solution! 
    } 
    operator std::map<char, int>() { return letterCount ; } 
}; 

int main() 
{ 
    ifstream input; 
    input.imbue(std::locale(std::locale(), new letter_only())); //enable reading only leters only! 
    input.open("filename.txt"); 
    istream_iterator<char> start(input); 
    istream_iterator<char> end; 
    std::map<char, int> letterCount = std::for_each(start, end, Counter()); 
    for (std::map<char, int>::iterator it = letterCount.begin(); it != letterCount.end(); ++it) 
    { 
      cout << it->first <<" : "<< it->second << endl; 
    } 
}

這個被修改（未經測試）版本：

Elegant ways to count the frequency of words in a file

來源

2011-03-05 20:20:36 Nawaz

哇，非常教育的結果。現在有點超出我的能力，但是一旦我完成任務，我會努力理解這一點。 – cyotee 2011-03-06 01:24:52

鑑於你顯然只想計算英文字母，似乎你應該能夠大大簡化你的代碼：

int main(int argc, char **argv) { 
    std::ifstream infile(argv[1]); 

    char ch; 
    static int counts[26]; 

    while (infile >> ch) 
     if (isalpha(ch)) 
      ++counts[tolower(ch)-'a']; 

    for (int i=0; i<26; i++) 
     std::cout << 'A' + i << ": " << counts[i] <<"\n"; 
    return 0; 
}

當然，還有更多的可能性。與@ Nawaz的代碼相比（例如），這顯然比較短並且更簡單 - 但它也更加有限（例如，它現在只有可以處理未加重音符的英文字符）。它幾乎侷限於基本的ASCII字母 - EBCDIC編碼，ISO 8859-x或Unicode將完全打破它。

他還可以很容易地將「僅字母」過濾應用到任何文件。兩者之間的選擇取決於您是否希望/需要/可以使用該靈活性。如果您只關心問題中提到的字母，並且只關注使用某些ASCII超集的典型機器，則該代碼將更輕鬆地處理這項工作 - 但如果您需要的更多，則完全不適合。

來源

2011-03-05 20:48:00

計算文件中字母的出現次數

回答

相關問題