2014-02-10 32 views
0

我想從文本中獲取句子。文本內容充滿段落,!.或任何其他行分隔符。使用正則表達式我可以做到,但不需要regext庫。有沒有分離句子的C++類?從文本中刪除句子以獲取所有句子separateloy存儲在某些數據結構中

否則,另一個步驟是比較每個字符與行分隔字符。但我不知道如何用矢量做到這一點。任何幫助表示讚賞。

這去與正則表達式

#include <string> 
#include <vector> 
#include <iostream> 
#include <iterator> 
#include <boost/regex.hpp> 

int main() 
{ 
    /* Input. */ 
    std::string input = "Here is a short sentence. Here is another one. And we say \"this is the final one.\", which is another example."; 

    /* Define sentence boundaries. */ 
    boost::regex re("(?: [\\.\\!\\?]\\s+" // case 1: punctuation followed by whitespace 
        "| \\.\\\",?\\s+" // case 2: start of quotation 
        "| \\s+\\\")",  // case 3: end of quotation 
      boost::regex::perl | boost::regex::mod_x); 

    /* Iterate through sentences. */ 
    boost::sregex_token_iterator it(begin(input),end(input),re,-1); 
    boost::sregex_token_iterator endit; 

    /* Copy them onto a vector. */ 
    std::vector<std::string> vec; 
    std::copy(it,endit,std::back_inserter(vec)); 

    /* Output the vector, so we can check. */ 
    std::copy(begin(vec),end(vec), 
      std::ostream_iterator<std::string>(std::cout,"\n")); 

    return 0; 
} 

回答

1

用蠻力的辦法......我希望我的理解正確您的請求......

#include <vector> 
#include <string> 
#include <iostream> 

int main() 
{ 
    std::string input = "Here is a short sentence. Here is another one. And we say \"this is the final one.\", which is another example."; 
    int i = 0; 
    std::vector<std::string> sentences; 
    std::string current; 
    while(i < input.length()) 
    { 
     current += input[i]; 

     if(input[i] == '"') 
     { 
      int j = i + 1; 
      while(j < input.length() && input[j] != '"') 
      { 
       current += input[j]; 
       j ++; 
      } 
      current += input[j]; 
      i = j + 1; 
     } 

     if(input[i] == '.' || input [i] == '!' || input[i] == '?') 
     { 
      sentences.push_back(current); 
      current = ""; 
     } 
     i ++; 
    } 

    for(i =0; i<sentences.size(); i++) 
    { 
     std::cout << i << " -> " << sentences[i] << std::endl; 
    } 
} 

顯然,這需要更多的細化,比如移除多個空間等...

相關問題