爲什麼從文本文件C++多行映射文本到行時，我會獲得額外的索引值？

我正在處理一個多文件程序，它接收一個文本文件，刪除打洞，然後創建每個單詞的索引，以及它出現在哪一行。代碼編譯並運行，但我得到的輸出是我不想要的。我很確定問題在於處理標點符號。每次該單詞後面跟着一個句點字符，它就會將該單詞記爲兩次，即使我排除了puncuation。然後它將最終的單詞輸出幾次，表示它存在於文件中不存在的行上。一些幫助將不勝感激！爲什麼從文本文件C++多行映射文本到行時，我會獲得額外的索引值？

輸入文件：

dogs run fast. 
dogs bark loud. 
cats sleep hard. 
cats are not dogs. 
Thank you. 
#

C++代碼：

#include <iostream> 
#include <string> 
#include <fstream> 
#include <sstream> 
#include <map> 

using namespace std; 

int main(){ 

    ifstream input; 
    input.open("NewFile.txt"); 
    if (!input) 
    { 
     cout << "Error opening file." << endl; 
     return 0; 
    } 

    multimap< string, int, less<string> > words; 
    int line; //int variable line 
    string word;//string variable word 

    // For each line of text, the length of input, increment line 
    for (line = 1; input; line++) 
    { 
     char buf[ 255 ];//create a character with space of 255 
     input.getline(buf, 128);//buf is pointer to array of chars where 
     //extracted, 128 is maximum num of chars to write to s. 

     // Discard all punctuation characters, leaving only words 
     for (char *p = buf; 
       *p != '\0'; 
       p++) 

     { 
      if (ispunct(*p)) 
       *p = ' '; 
     } 
     // 

     istringstream i(buf); 

     while (i) 
     { 
      i >> word; 
      if (word != "") 
      { 
       words.insert(pair<const string,int>(word, line)); 
      } 
     } 
    } 

    input.close(); 

    // Output results 
    multimap< string, int, less<string> >::iterator it1; 
    multimap< string, int, less<string> >::iterator it2; 



    for (it1 = words.begin(); it1 != words.end();) 
    { 

     it2 = words.upper_bound((*it1).first); 
     cout << (*it1).first << " : "; 

     for (; it1 != it2; it1++) 
     { 
      cout << (*it1).second << " "; 
     } 
     cout << endl; 
    } 

    return 0; 
}

輸出：

Thank : 5 
are : 4 
bark : 2 
cats : 3 4 
dogs : 1 2 4 4 
fast : 1 1 
hard : 3 3 
loud : 2 2 
not : 4 
run : 1 
sleep : 3 
you : 5 5 6 7

所需的輸出：

Thank : 5 
are : 4 
bark : 2 
cats : 3 4 
dogs : 1 2 4 
fast : 1 
hard : 3 
loud : 2 
not : 4 
run : 1 
sleep : 3 
you : 5

在此先感謝您的幫助！

來源

2017-03-18 cparks10

而當你在調試器中通過這個步驟時，你看到了什麼？ –

@RichardCritten啊！出於某種原因，它在句子結尾添加了一個額外的計數。它正在做一個額外的行44'words.insert（pair （word，line））;'爲什麼這樣做？它不應該停止，因爲標點已被刪除？ – cparks10

您不是刪除標點符號，而是用空格替換。 istringstream試圖解析這些空間，但如果失敗。您應該檢查是否解析一個字是成功與否這樣的方式：

i >> word; 
if (!i.fail()) { 
    words.insert(pair<const string, int>(word, line)); 
}

由於您使用C++，它會更方便，避免使用指針，而專注於使用std功能。我會重寫這樣的代碼的一部分：

// For each line of text, the length of input, increment line 
for (line = 1; !input.eof(); line++) 
{ 
    std::string buf; 
    std::getline(input, buf); 

    istringstream i(buf); 

    while (i) 
    { 
     i >> word; 
     if (!i.fail()) { 
      std::string cleanWord; 
      std::remove_copy_if(word.begin(), word.end(), 
           std::back_inserter(cleanWord), 
           std::ptr_fun<int, int>(&std::ispunct) 
      ); 
      if (!cleanWord.empty()) { 
       words.insert(pair<const string, int>(cleanWord, line)); 
      } 
     } 
    } 
} 

input.close(); 

// Output results 
multimap< string, int, less<string> >::iterator it1; 
multimap< string, int, less<string> >::iterator it2;

來源

2017-03-18 17:03:45

謝謝。那麼我需要閱讀我的'istringstream'。 – cparks10

爲什麼從文本文件C++多行映射文本到行時，我會獲得額外的索引值？

回答

相關問題