響應EDITED發表評論,查看下面的 「UPDATE」」
我建議不要試圖解決的是,在詞法分析器讓詞法分析器產量原始字符串:
template <typename Lexer>
struct mylexer_t : lex::lexer<Lexer>
{
mylexer_t()
{
string_quote_double = "\\\"([^\"]|\\\"\\\")*\\\"";
this->self("INITIAL")
= string_quote_double
| lex::token_def<>("[ \t\r\n]") [ lex::_pass = lex::pass_flags::pass_ignore ]
;
}
lex::token_def<std::string> string_quote_double;
};
注意暴露像這樣的令牌屬性需要修改令牌typedef:
typedef lex::lexertl::token<char const*, boost::mpl::vector<char, std::string> > token_type;
typedef lex::lexertl::actor_lexer<token_type> lexer_type;
周
後處理的解析器:
template <typename Iterator> struct mygrammar_t
: public qi::grammar<Iterator, std::vector<std::string>()>
{
typedef mygrammar_t<Iterator> This;
template <typename TokenDef>
mygrammar_t(TokenDef const& tok) : mygrammar_t::base_type(start)
{
using namespace qi;
string_quote_double %= tok.string_quote_double [ undoublequote ];
start = *string_quote_double;
BOOST_SPIRIT_DEBUG_NODES((start)(string_quote_double));
}
private:
qi::rule<Iterator, std::vector<std::string>()> start;
qi::rule<Iterator, std::string()> string_quote_double;
};
正如你所看到的,undoubleqoute
可以是任何鳳凰演員滿足標準的精神語義動作。腦死亡示例實現將是:
static bool undoublequote(std::string& val)
{
auto outidx = 0;
for(auto in = val.begin(); in!=val.end(); ++in) {
switch(*in) {
case '"':
if (++in == val.end()) { // eat the escape
// end of input reached
val.resize(outidx); // resize to effective chars
return true;
}
// fall through
default:
val[outidx++] = *in; // append the character
}
}
return false; // not ended with double quote as expected
}
但我建議你寫一個「正確的」去逃避者(我敢肯定,MySQL將允許\t
,\r
,\u001e
甚至更古老的東西,以及)。
我在舊的答案在這裏一些更全面的樣本:
UPDATE
事實上如y OU指示的,它是相當容易的屬性值正常化融入詞法本身:
template <typename Lexer>
struct mylexer_t : lex::lexer<Lexer>
{
struct undoublequote_lex_type {
template <typename, typename, typename, typename> struct result { typedef void type; };
template <typename It, typename IdType, typename pass_flag, typename Ctx>
void operator()(It& f, It& l, pass_flag& pass, IdType& id, Ctx& ctx) const {
std::string raw(f,l);
if (undoublequote(raw))
ctx.set_value(raw);
else
pass = lex::pass_flags::pass_fail;
}
} undoublequote_lex;
mylexer_t()
{
string_quote_double = "\\\"([^\"]|\\\"\\\")*\\\"";
const static undoublequote_lex_type undoublequote_lex;
this->self("INITIAL")
= string_quote_double [ undoublequote_lex ]
| lex::token_def<>("[ \t\r\n]") [ lex::_pass = lex::pass_flags::pass_ignore ]
;
}
lex::token_def<std::string> string_quote_double;
};
這重用上面示出的相同undoublequote
的功能,但將其包裝在延遲可調用對象(或「多晶型仿函數」)undoublequote_lex_type
那滿足the criteria for a Lexer Semantic Action。
這裏是概念的全面工作證明:
//#include <boost/config/warning_disable.hpp>
//#define BOOST_SPIRIT_DEBUG_PRINT_SOME 80
//#define BOOST_SPIRIT_DEBUG // before including Spirit
#include <boost/spirit/include/lex_lexertl.hpp>
#include <boost/spirit/include/qi.hpp>
#include <fstream>
#ifdef MEMORY_MAPPED
# include <boost/iostreams/device/mapped_file.hpp>
#endif
//#include <boost/spirit/include/lex_generate_static_lexertl.hpp>
namespace /*anon*/
{
namespace phx=boost::phoenix;
namespace qi =boost::spirit::qi;
namespace lex=boost::spirit::lex;
template <typename Lexer>
struct mylexer_t : lex::lexer<Lexer>
{
mylexer_t()
{
string_quote_double = "\\\"([^\"]|\\\"\\\")*\\\"";
this->self("INITIAL")
= string_quote_double
| lex::token_def<>("[ \t\r\n]") [ lex::_pass = lex::pass_flags::pass_ignore ]
;
}
lex::token_def<std::string> string_quote_double;
};
static bool undoublequote(std::string& val)
{
auto outidx = 0;
for(auto in = val.begin(); in!=val.end(); ++in) {
switch(*in) {
case '"':
if (++in == val.end()) { // eat the escape
// end of input reached
val.resize(outidx); // resize to effective chars
return true;
}
// fall through
default:
val[outidx++] = *in; // append the character
}
}
return false; // not ended with double quote as expected
}
template <typename Iterator> struct mygrammar_t
: public qi::grammar<Iterator, std::vector<std::string>()>
{
typedef mygrammar_t<Iterator> This;
template <typename TokenDef>
mygrammar_t(TokenDef const& tok) : mygrammar_t::base_type(start)
{
using namespace qi;
string_quote_double %= tok.string_quote_double [ undoublequote ];
start = *string_quote_double;
BOOST_SPIRIT_DEBUG_NODES((start)(string_quote_double));
}
private:
qi::rule<Iterator, std::vector<std::string>()> start;
qi::rule<Iterator, std::string()> string_quote_double;
};
}
std::vector<std::string> do_test_parse(const std::string& v)
{
char const *first = &v[0];
char const *last = first+v.size();
typedef lex::lexertl::token<char const*, boost::mpl::vector<char, std::string> > token_type;
typedef lex::lexertl::actor_lexer<token_type> lexer_type;
typedef mylexer_t<lexer_type>::iterator_type iterator_type;
const static mylexer_t<lexer_type> mylexer;
const static mygrammar_t<iterator_type> parser(mylexer);
auto iter = mylexer.begin(first, last);
auto end = mylexer.end();
std::vector<std::string> data;
bool r = qi::parse(iter, end, parser, data);
r = r && (iter == end);
if (!r)
std::cerr << "parsing (" << iter->state() << ") failed at: '" << std::string(first, last) << "'\n";
return data;
}
int main(int argc, const char *argv[])
{
for (auto&& s : do_test_parse("\"bla\"\"blo\""))
std::cout << s << std::endl;
}
- 事實上,我認爲這是不可能的。儘管如此,我還是希望得到一個更簡單的解決方案,即不涉及語法。這真的需要嗎?也許只是爲了使測試/調試更容易? – coproc 2013-05-13 18:52:37
@coproc當然,你可以:/這不是我所建議的。我已經添加了一個包裝器'undouble_quote_lex'函子,它向您展示瞭如何在純lex中執行此操作(請參閱** UPDATE **)。 ** [類似改編的示例程序](http://ideone.com/BGXH9W)**仍按預期打印'bla「blo' – sehe 2013-05-14 06:50:38
我仍在咀嚼令牌類型。'char'的用意是什麼?爲'AttributeTypes'輸入'mpl :: vector'?是不是'std :: string'類型就足夠了?實際上我不明白爲什麼在一個標記類型定義中可能有多個屬性類型,以及它們如何可以使用 – coproc 2013-05-14 07:47:38