如何將boost :: spirit :: lex標記的值從iterator_range轉換爲字符串？

當我嘗試從iterator_range轉換一個標記的值時，詞法分析器在嘗試讀取下一個標記時失敗。如何將boost :: spirit :: lex標記的值從iterator_range轉換爲字符串？

這裏的令牌結構，其持有令牌的定義：（我不認爲這是相關的，但我包括以防萬一。）

template <typename Lexer> 
struct Tokens : boost::spirit::lex::lexer<Lexer> 
{ 
    Tokens(); 

    boost::spirit::lex::token_def<std::string> identifier; 
    boost::spirit::lex::token_def<std::string> string; 
    boost::spirit::lex::token_def<bool> boolean; 
    boost::spirit::lex::token_def<double> real; 
    boost::spirit::lex::token_def<> comment; 
    boost::spirit::lex::token_def<> whitespace; 
}; 

template <typename Lexer> 
Tokens<Lexer>::Tokens() 
{ 
    // Define regex macros 
    this->self.add_pattern 
     ("LETTER", "[a-zA-Z_]") 
     ("DIGIT", "[0-9]") 
     ("INTEGER", "-?{DIGIT}+") 
     ("FLOAT", "-?{DIGIT}*\\.{DIGIT}+"); 

    // Define the tokens' regular expressions 
    identifier = "{LETTER}({LETTER}|{DIGIT})*"; 
    string = "\"[a-zA-Z_0-9]*\""; 
    boolean = "true|false"; 
    real = "{INTEGER}|{FLOAT}"; 
    comment = "#[^\n\r\f\v]*$"; 
    whitespace = "\x20\n\r\f\v\t+"; 

    // Define tokens 
    this->self 
     = identifier 
     | string 
     | boolean 
     | real 
     | '{' 
     | '}' 
     | '<' 
     | '>'; 

    // Define tokens to be ignored 
    this->self("WS") 
     = whitespace 
     | comment; 
}

這裏是我的令牌和詞法類型的定義：

typedef lex::lexertl::token<char const*> TokenType; 
typedef lex::lexertl::actor_lexer<TokenType> LexerType;

下面是我用於讀取令牌並將其值轉換爲字符串的代碼。

Tokens<LexerType> tokens; 

std::string string = "9index"; 
char const* first = string.c_str(); 
char const* last = &first[string.size()]; 
LexerType::iterator_type token = tokens.begin(first, last); 
LexerType::iterator_type end = tokens.end(); 

//typedef boost::iterator_range<char const*> iterator_range; 
//const iterator_range& range = boost::get<iterator_range>(token->value()); 
//std::cout << std::string(range.begin(), range.end()) << std::endl; 

++token; 

token_is_valid(*token); // Returns false ONLY if I uncomment the above code

此代碼的輸出爲「9」（它讀取第一個數字，在流中留下「索引」）。如果我在這一點打印出字符串的值（第一個，最後一個），它會顯示「ndex」。出於某種原因，詞法分析器在'我'字符上失敗了？

我甚至一直在使用一個std :: stringstream的做轉換嘗試，但是這也將導致下一個標記是無效的：

std::stringstream out; 
out << token->value(); 
std::cout << out.str() << std::endl; 

++token; 

token_is_valid(*token); // still fails

最後，下一個標記是有效的，如果我只是發令牌的值來清點：

std::cout << token->value() << std::endl; 

++token; 

token_is_valid(*token); // success, what?

我缺少的是約iterator_range的令牌的由返回如何>值（）的作品？我用來將其轉換爲字符串的方法都不會修改integer_range或詞法分析器的輸入字符流。

編輯：我在這裏添加這裏，因爲評論回覆太短，無法完全解釋發生了什麼。

我想通了。正如sehe和drhirsch指出的那樣，我原來的問題中的代碼是我實際上在做的一個消毒版本。我使用測試夾具類使用gtest單元測試來測試詞法分析器。作爲該類的成員，我有void scan（const std :: string & str），它從給定的字符串中分配第一個和最後一個迭代器（燈具的數據成員）。問題是，只要我們退出此函數，const std :: string & str參數是彈出堆棧並且不再存在，即使它們是fixture的數據成員，也會使這些迭代器無效。

故事的寓意：只要您希望讀取令牌，迭代器傳遞給lexer :: begin（）引用的對象就應該存在。

我寧願刪除這個問題，而不是在互聯網上記錄我愚蠢的錯誤，但爲了幫助社區，我想我應該離開它。

來源

2012-05-02 Brynn Mahsman

什麼編譯器/庫版本？可選編譯器標誌？我在代碼中修復了語法錯誤 – sehe

下次考慮發佈一個最小化的_selfcontained_示例。需要一位經驗豐富的Spirit開發人員大約10分鐘，從上面的片段中設置一個可編輯的例子。這就爲95％左右的受衆提出了回答「無限」的障礙。 – sehe

@Sehe True。這個問題可能會隱藏在你沒有給我們看的代碼部分 – hirschhornsalz

從給定的代碼判斷，你似乎正在尋找一個編譯器/庫錯誤。我不能與任何下列組合重現該問題：

編輯現在包括鐺++和提高1_49_0。 Valgrind爲選定數量的測試案例提供了清潔。

鐺++ 2.9，-O3，升壓1_46_1
鐺++ 2.9，-O0，升壓1_46_1
鐺++ 2.9，-O3，升壓1_48_0
鐺++ 2.9，-O0，升壓1_48_0
鐺++ 2.9，-O3，提高1_49_0
鐺++ 2.9，-O0，提高1_49_0
GCC 4.4.5，-O0，提高1_42_1
GCC 4.4.5，-O0，提高1_46_1
GCC 4.4.5，-O0，提高1_48_0
GCC 4.4.5，-O0，升壓1_49_0
GCC 4.4.5，-O3，提高1_42_1
GCC 4.4.5，-O3，提高1_46_1
GCC 4.4.5，-O3，提高1_48_0
GCC 4.4.5，-O3 ，升壓1_49_0
gcc 4.6.1，-O0，升壓1_46_1
GCC 4.6.1，-O0，提高1_48_0
GCC 4.6.1，-O0，提高1_49_0
GCC 4.6.1，-O3，提高1_42_1
GCC 4.6.1，-O3，提高1_46_1
GCC 4.6.1，-O3，提高1_48_0
GCC 4.6.1，-O3，提高1_49_0

的完整代碼進行測試：

#include <boost/spirit/include/qi.hpp> 
#include <boost/spirit/include/lex_lexertl.hpp> 

namespace qi = boost::spirit::qi; 
namespace lex = boost::spirit::lex; 

template <typename Lexer> 
struct Tokens : lex::lexer<Lexer> 
{ 
    Tokens(); 

    lex::token_def<std::string> identifier; 
    lex::token_def<std::string> string; 
    lex::token_def<bool> boolean; 
    lex::token_def<double> real; 
    lex::token_def<> comment; 
    lex::token_def<> whitespace; 
}; 

template <typename Lexer> 
Tokens<Lexer>::Tokens() 
{ 
    // Define regex macros 
    this->self.add_pattern 
     ("LETTER", "[a-zA-Z_]") 
     ("DIGIT", "[0-9]") 
     ("INTEGER", "-?{DIGIT}+") 
     ("FLOAT", "-?{DIGIT}*\\.{DIGIT}+"); 

    // Define the tokens' regular expressions 
    identifier = "{LETTER}({LETTER}|{DIGIT})*"; 
    string = "\"[a-zA-Z_0-9]*\""; 
    boolean = "true|false"; 
    real = "{INTEGER}|{FLOAT}"; 
    comment = "#[^\n\r\f\v]*$"; 
    whitespace = "\x20\n\r\f\v\t+"; 

    // Define tokens 
    this->self 
     = identifier 
     | string 
     | boolean 
     | real 
     | '{' 
     | '}' 
     | '<' 
     | '>'; 

    // Define tokens to be ignored 
    this->self("WS") 
     = whitespace 
     | comment; 
} 

//////////////////////////////////////////////// 
typedef lex::lexertl::token<char const*> TokenType; 
typedef lex::lexertl::actor_lexer<TokenType> LexerType; 

int main(int argc, const char *argv[]) 
{ 
    Tokens<LexerType> tokens; 

    std::string string = "9index"; 
    char const* first = string.c_str(); 
    char const* last = &first[string.size()]; 
    LexerType::iterator_type token = tokens.begin(first, last); 
    LexerType::iterator_type end = tokens.end(); 

    typedef boost::iterator_range<char const*> iterator_range; 
    const iterator_range& range = boost::get<iterator_range>(token->value()); 
    std::cout << std::string(range.begin(), range.end()) << std::endl; 

    ++token; 

    // Returns false ONLY if I uncomment the above code 
    std::cout << "Next valid: " << std::boolalpha << token_is_valid(*token) << '\n'; 

    return 0; 
}

來源

2012-05-02 09:10:48 sehe

謝謝，我會在我的系統上測試這個。 –

請參閱我上面的編輯。謝謝你的幫助。 –

@Phineas感謝您的反饋。 – sehe

如何將boost :: spirit :: lex標記的值從iterator_range轉換爲字符串？

回答

相關問題