2015-12-21 158 views
0

我有下面的代碼打印每一個獨特的字,從一個文本文件的數量(包含> = 30K字),但它是由空格隔開的話,我就喜歡這樣的結果:C++指定分隔從文本文件中讀取單詞

enter image description here

如何修改代碼以指定預期的分隔符?

template <class KTy, class Ty> 
void PrintMap(map<KTy, Ty> map) 
{ 
    typedef std::map<KTy, Ty>::iterator iterator; 
    for (iterator p = map.begin(); p != map.end(); p++) 
     cout << p->first << ": " << p->second << endl; 
} 

void UniqueWords(string fileName) { 
    // Will store the word and count. 
    map<string, unsigned int> wordsCount; 

    // Begin reading from file: 
    ifstream fileStream(fileName); 

    // Check if we've opened the file (as we should have). 
    if (fileStream.is_open()) 
     while (fileStream.good()) 
     { 
      // Store the next word in the file in a local variable. 
      string word; 
      fileStream >> word; 

      //Look if it's already there. 
      if (wordsCount.find(word) == wordsCount.end()) // Then we've encountered the word for a first time. 
       wordsCount[word] = 1; // Initialize it to 1. 
      else // Then we've already seen it before.. 
       wordsCount[word]++; // Just increment it. 
     } 
    else // We couldn't open the file. Report the error in the error stream. 
    { 
     cerr << "Couldn't open the file." << endl; 
    } 

    // Print the words map. 
    PrintMap(wordsCount); 
} 
+2

並不清楚你的要求。 – OldProgrammer

+0

只是要說清楚:你想把「你」算作「你」,「你!」作爲「你」等?是這樣嗎? – Paulo

+0

@OldProgrammer這段代碼打印每個唯一字的頻率,但它的計算方式是空白,它正在考慮'you'和你''因爲不同的詞 –

回答

0

fileStream >> word;之後,你可以調用這個函數。看看,看看它是否是明確的:

string adapt(string word) { 
    string forbidden = "!?,.[];"; 
    string ret = ""; 
    for(int i = 0; i < word.size(); i++) { 
     bool ok = true; 
     for(int j = 0; j < forbidden.size(); j++) { 
      if(word[i] == forbidden[j]) { 
       ok = false; 
       break; 
      } 
     } 
     if(ok) 
      ret.push_back(word[i]); 
    } 
    return ret; 
} 

事情是這樣的:

fileStream >> word; 
word = adapt(word); 
+1

擔心工作就像一個魅力 –

+0

豈不如果有喜歡的'ABC一句話這個產生錯誤的結果!def'?看起來會創建'abcdef'而不是兩個單詞'abc'和'def'。 –

+0

在'UniqueWords'函數中使用它之後是否需要關閉流? –

2

您可以使用流與std::ctype<char>imbue() ED其認爲任何字符你看中的空間。這樣做看起來像這樣:

#include<locale> 
#include<cctype> 

struct myctype_table { 
    std::ctype_base::mask table[std::ctype<char>::table_size]; 
    myctype_table(char const* spaces) { 
     while (*spaces) { 
      table[static_cast<unsigned char>(*spaces)] = std::ctype_base::isspace; 
     } 
    } 
}; 
class myctype 
    : private myctype_table, 
    , public std::ctype<char> { 
public: 
    myctype(char const* spaces) 
     : myctype_table(spaces) 
     , std::ctype<char>(table) { 
    }; 
}; 

int main() { 
    std::locale myloc(std::locale(), new myctype(" \t\n\r?:.,!")); 
    std::cin.imbue(myloc); 
    for (std::string word; std::cin >> word;) { 
     // words are separated by the extended list of spaces 
    } 
} 

此代碼現在沒有測試 - 我正在移動設備上打字。我可能錯誤地使用了一些std::cypte<char>接口,但是在修復名稱等之後應該沿着這些方向行事。

1

正如你所期望的發現單詞的末尾禁止的字符,你可以刪除他們的話推到wordsCount之前:

if(word[word.length()-1] == ';' || word[word.length()-1] == ',' || ....){ 
    word.erase(word.length()-1); 
}