2012-11-13 163 views
5

我嘗試學習使用boost :: spirit。要做到這一點,我想創建一個簡單的詞法分析器,將它們結合起來,然後使用spirit開始分析。我試着修改這個例子,但它沒有像預期的那樣運行(結果r不正確)。麻煩加精:: lex&whitespace

這裏的詞法分析器:

#include <boost/spirit/include/lex_lexertl.hpp> 

namespace lex = boost::spirit::lex; 

template <typename Lexer> 
struct lexer_identifier : lex::lexer<Lexer> 
{ 
    lexer_identifier() 
     : identifier("[a-zA-Z_][a-zA-Z0-9_]*") 
     , white_space("[ \\t\\n]+") 
    { 
     using boost::spirit::lex::_start; 
     using boost::spirit::lex::_end; 

     this->self = identifier; 
     this->self("WS") = white_space; 
    } 
    lex::token_def<> identifier; 
    lex::token_def<> white_space; 
    std::string identifier_name; 
}; 

這是我嘗試運行的例子:因爲只有一個字符串中令牌

#include "stdafx.h" 

#include <boost/spirit/include/lex_lexertl.hpp> 
#include "my_Lexer.h" 

namespace lex = boost::spirit::lex; 

int _tmain(int argc, _TCHAR* argv[]) 
{ 
    typedef lex::lexertl::token<char const*,lex::omit, boost::mpl::false_> token_type; 
    typedef lex::lexertl::lexer<token_type> lexer_type; 

    typedef lexer_identifier<lexer_type>::iterator_type iterator_type; 

    lexer_identifier<lexer_type> my_lexer; 

    std::string test("adedvied das934adf dfklj_03245"); 

    char const* first = test.c_str(); 
    char const* last = &first[test.size()]; 

    lexer_type::iterator_type iter = my_lexer.begin(first, last); 
    lexer_type::iterator_type end = my_lexer.end(); 

    while (iter != end && token_is_valid(*iter)) 
    { 
     ++iter; 
    } 

    bool r = (iter == end); 

    return 0; 
} 

r是隻要真實的。爲什麼會這樣?

問候 托比亞斯

回答

10

您已經創建了第二個詞法分析器狀態,但永遠不會調用它。

簡化及利潤:


在大多數情況下,能有預期的效果最簡單的方法是使用單態樂星與pass_ignore標誌上的可跳過標記:

this->self += identifier 
       | white_space [ lex::_pass = lex::pass_flags::pass_ignore ]; 

請注意,這需要actor_lexer以允許語義操作:

typedef lex::lexertl::actor_lexer<token_type> lexer_type; 

全樣本:

#include <boost/spirit/include/lex_lexertl.hpp> 
#include <boost/spirit/include/lex_lexertl.hpp> 
namespace lex = boost::spirit::lex; 

template <typename Lexer> 
struct lexer_identifier : lex::lexer<Lexer> 
{ 
    lexer_identifier() 
     : identifier("[a-zA-Z_][a-zA-Z0-9_]*") 
     , white_space("[ \\t\\n]+") 
    { 
     using boost::spirit::lex::_start; 
     using boost::spirit::lex::_end; 

     this->self += identifier 
        | white_space [ lex::_pass = lex::pass_flags::pass_ignore ]; 
    } 
    lex::token_def<> identifier; 
    lex::token_def<> white_space; 
    std::string identifier_name; 
}; 

int main(int argc, const char *argv[]) 
{ 
    typedef lex::lexertl::token<char const*,lex::omit, boost::mpl::false_> token_type; 
    typedef lex::lexertl::actor_lexer<token_type> lexer_type; 

    typedef lexer_identifier<lexer_type>::iterator_type iterator_type; 

    lexer_identifier<lexer_type> my_lexer; 

    std::string test("adedvied das934adf dfklj_03245"); 

    char const* first = test.c_str(); 
    char const* last = &first[test.size()]; 

    lexer_type::iterator_type iter = my_lexer.begin(first, last); 
    lexer_type::iterator_type end = my_lexer.end(); 

    while (iter != end && token_is_valid(*iter)) 
    { 
     ++iter; 
    } 

    bool r = (iter == end); 
    std::cout << std::boolalpha << r << "\n"; 
} 

打印

true 

「WS」 作爲船長狀態


也可能您在使用第二解析器狀態的樣本來船長(lex::tokenize_and_phrase_parse)。讓我花一分鐘或10分鐘爲此創建一個工作示例。

更新我花了一點時間超過10分鐘(waaaah):)這裏有一個對比試驗,展示如何詞法分析器狀態互動,以及如何使用精神船長解析調用第二解析器的狀態:

#include <boost/spirit/include/qi.hpp> 
#include <boost/spirit/include/lex_lexertl.hpp> 
namespace lex = boost::spirit::lex; 
namespace qi = boost::spirit::qi; 

template <typename Lexer> 
struct lexer_identifier : lex::lexer<Lexer> 
{ 
    lexer_identifier() 
     : identifier("[a-zA-Z_][a-zA-Z0-9_]*") 
     , white_space("[ \\t\\n]+") 
    { 
     this->self  = identifier; 
     this->self("WS") = white_space; 
    } 
    lex::token_def<> identifier; 
    lex::token_def<lex::omit> white_space; 
}; 

int main() 
{ 
    typedef lex::lexertl::token<char const*, lex::omit, boost::mpl::true_> token_type; 
    typedef lex::lexertl::lexer<token_type> lexer_type; 

    typedef lexer_identifier<lexer_type>::iterator_type iterator_type; 

    lexer_identifier<lexer_type> my_lexer; 

    std::string test("adedvied das934adf dfklj_03245"); 

    { 
     char const* first = test.c_str(); 
     char const* last = &first[test.size()]; 

     // cannot lex in just default WS state: 
     bool ok = lex::tokenize(first, last, my_lexer, "WS"); 
     std::cout << "Starting state WS:\t" << std::boolalpha << ok << "\n"; 
    } 

    { 
     char const* first = test.c_str(); 
     char const* last = &first[test.size()]; 

     // cannot lex in just default state either: 
     bool ok = lex::tokenize(first, last, my_lexer, "INITIAL"); 
     std::cout << "Starting state INITIAL:\t" << std::boolalpha << ok << "\n"; 
    } 

    { 
     char const* first = test.c_str(); 
     char const* last = &first[test.size()]; 

     bool ok = lex::tokenize_and_phrase_parse(first, last, my_lexer, *my_lexer.self, qi::in_state("WS")[my_lexer.self]); 
     ok = ok && (first == last); // verify full input consumed 
     std::cout << std::boolalpha << ok << "\n"; 
    } 
} 

輸出是

Starting state WS: false 
Starting state INITIAL: false 
true 
+0

加入下**' 「WS」 作爲船長state'演示的 「WS」 狀態的方法**。歡呼 – sehe

+0

哎呀。我複製了錯誤的token_type聲明。它需要'MPL ::爲['HasState'] true_'(http://www.boost.org/doc/libs/1_49_0/libs/spirit/doc/html/spirit/lex/abstracts/lexer_primitives/lexer_token_values.html #spirit.lex.abstracts.lexer_primitives.lexer_token_values.the_anatomy_of_a_token),當處理有狀態的詞法分析器 - 顯然! ***首先修復*** – sehe

+0

- 感謝您的廣泛示例。我仍然有一些問題:lex :: omit做什麼?關於tokenize_and_parse調用:什麼是my_lexer.self&qi :: in_state(「WS」)[my_lexer.self]? –