2014-02-07 66 views
3

我正在努力讓自己的腦袋繞過LPEG。我已經設法產生了一個符合我想要的語法,但是我一直在抨擊這個語法並且沒有走得太遠。這個想法是解析一個TeX的簡化形式的文檔。我想一個文件分成:用lpeg解析類TeX語言

  • 環境,這是\begin{cmd}\end{cmd}雙。
  • 命令它可以採取像這樣的參數:\foo{bar}或可以是裸露的:\foo
  • 環境和命令都可以具有如下參數:\command[color=green,background=blue]{content}。其他東西

我也想跟蹤行號信息的錯誤處理的目的。這是我到目前爲止:

lpeg = require("lpeg") 
lpeg.locale(lpeg) 
-- Assume a lot of "X = lpeg.X" here. 

-- Line number handling from http://lua-users.org/lists/lua-l/2011-05/msg00607.html 
-- with additional print statements to check they are working. 
local newline = P"\r"^-1 * "\n"/function (a) print("New"); end 
local incrementline = Cg(Cb"linenum")/ function (a) print("NL"); return a + 1 end , "linenum" 
local setup = Cg (Cc (1) , "linenum") 
nl = newline * incrementline 
space = nl + lpeg.space 

-- Taken from "Name-value lists" in http://www.inf.puc-rio.br/~roberto/lpeg/ 
local identifier = (R("AZ") + R("az") + P("_") + R("09"))^1 
local sep = lpeg.S(",;") * space^0 
local value = (1-lpeg.S(",;]"))^1 
local pair = lpeg.Cg(C(identifier) * space ^0 * "=" * space ^0 * C(value)) * sep^-1 
local list = lpeg.Cf(lpeg.Ct("") * pair^0, rawset) 
local parameters = (P("[") * list * P("]")) ^-1 

-- And the rest is mine 

anything = C((space^1 + (1-lpeg.S("\\{}")))^1) * Cb("linenum")/function (a,b) return { text = a, line = b } end 

begin_environment = P("\\begin") * Ct(parameters) * P("{") * Cg(identifier, "environment") * Cb("environment") * P("}")/function (a,b) return { params = a[1], environment = b } end 
end_environment = P("\\end{") * Cg(identifier) * P("}") 

texlike = lpeg.P{ 
    "document"; 
    document = setup * V("stuff") * -1, 
    stuff = Cg(V"environment" + anything + V"bracketed_stuff" + V"command_with" + V"command_without")^0, 
    bracketed_stuff = P"{" * V"stuff" * P"}"/function (a) return a end, 
    command_with =((P("\\") * Cg(identifier) * Ct(parameters) * Ct(V"bracketed_stuff"))-P("\\end{"))/function (i,p,n) return { command = i, parameters = p, nodes = n } end, 
    command_without = ((P("\\") * Cg(identifier) * Ct(parameters))-P("\\end{"))/function (i,p) return { command = i, parameters = p } end, 
    environment = Cg(begin_environment * Ct(V("stuff")) * end_environment)/function (b,stuff, e) return { b = b, stuff = stuff, e = e} end 
} 

它幾乎可行!

> texlike:match("\\foo[one=two]thing\\bar") 
{ 
    command = "foo", 
    parameters = { 
    { 
     one = "two", 
    }, 
    }, 
} 
{ 
    line = 1, 
    text = "thing", 
} 
{ 
    command = "bar", 
    parameters = { 
    }, 
} 

但是!首先,我不能讓行號處理部分工作。 incrementline內的功能永遠不會被觸發。

我也不太工作了捕獲如何嵌套的信息傳遞給處理函數(這就是爲什麼我有散CgCCt半隨機在語法)。這意味着,只有一個項目從內command_with返回:

> texlike:match("\\foo{text \\command moretext}") 
{ 
    command = "foo", 
    nodes = { 
    { 
     line = 1, 
     text = "text ", 
    }, 
    }, 
    parameters = { 
    }, 
} 

我也很想能檢查環境中啓動和結束匹配,但是當我試圖這樣做,從我的反向引用「開始「在我到達」結束「的時候並沒有在範圍內。我不知道該從哪裏出發。

回答

5

遲到的答案,但希望它會提供一些見解,如果你仍然在尋找解決方案或想知道問題是什麼。

你的語法有幾個問題,其中一些問題可能很難找到。這裏

你行增量看上去不正確:

local incrementline = Cg(Cb"linenum")/
         function (a) print("NL"); return a + 1 end, 
         "linenum" 

它看起來像你的意思是創建一個名爲捕獲組而不是一個匿名組。 backcapture linenum基本上被用作一個變量。問題是因爲這是匿名捕獲,linenum將不會正確更新 - function(a)將在調用時始終接收1。您需要關閉)移動到結束這樣"linenum"包括:

local incrementline = Cg(Cb"linenum"/
         function (a) print("NL"); return a + 1 end, 
         "linenum") 

相關LPeg documentationCg捕獲。

的第二個問題是您anything非終端的規則:

anything = C((space^1 + (1-lpeg.S("\\{}")))^1) * Cb("linenum") ... 

有幾件事情要小心這裏。首先,一個名爲Cg捕獲(從incrementline規則一旦它是固定的)不會產生任何東西,除非它在一個表中或您backref它。第二個主要問題是它有一個像變量一樣的特殊範圍。更確切地說,它的作用域結束,一旦你關閉它在外部捕獲 - 就像你在做什麼在這裏:

C((space^1 + (...))^1) 

其中由您引用其backcapture與* Cb("linenum")時表示,這是爲時已晚 - 在linenum你真的想要關閉它的範圍。

我總是發現LPEG的re語法有點容易神交所以我重寫了語法與替代:

local grammar_cb = 
{ 
    fold = pairfold, 
    resetlinenum = resetlinenum, 
    incrementlinenum = incrementlinenum, getlinenum = getlinenum, 
    error = error 
} 

local texlike_grammar = re.compile(
[[ 
    document <- '' -> resetlinenum {| docpiece* |} !. 
    docpiece <- {| envcmd |}/{| cmd |}/multiline 
    beginslash <- cmdslash 'begin' 
    endslash <- cmdslash 'end' 
    envcmd  <- beginslash paramblock? {:beginenv: envblock :} (!endslash docpiece)* 
       endslash openbrace {:endenv: =beginenv :} closebrace/&beginslash {} -> error . 
    envblock <- openbrace key closebrace 
    cmd   <- cmdslash {:command: identifier :} (paramblock? cmdblock)? 
    cmdblock <- openbrace {:nodes: {| docpiece* |} :} closebrace 
    paramblock <- opensq ({:parameters: {| parampairs |} -> fold :}/whitesp) closesq 
    parampairs <- parampair (sep parampair)* 
    parampair <- key assign value 
    key   <- whitesp { identifier } 
    value  <- whitesp { [^],;%s]+ } 
    multiline <- (nl? text)+ 
    text  <- {| {:text: (!cmd !closebrace !%nl [_%w%p%s])+ :} {:line: '' -> getlinenum :} |} 
    identifier <- [_%w]+ 
    cmdslash <- whitesp '\' 
    assign  <- whitesp '=' 
    sep   <- whitesp ',' 
    openbrace <- whitesp '{' 
    closebrace <- whitesp '}' 
    opensq  <- whitesp '[' 
    closesq  <- whitesp ']' 
    nl   <- {%nl+} -> incrementlinenum 
    whitesp  <- (nl/%s)* 
]], grammar_cb) 

,回調函數是直向前定義爲:

local function pairfold(...) 
    local t, kv = {}, ... 
    if #kv % 2 == 1 then return ... end 
    for i = #kv, 2, -2 do 
    t[ kv[i - 1] ] = kv[i] 
    end 
    return t 
end 

local incrementlinenum, getlinenum, resetlinenum do 
    local line = 1 
    function incrementlinenum(nl) 
    assert(not nl:match "%S") 
    line = line + #nl 
    end 

    function getlinenum() return line end 
    function resetlinenum() line = 1 end 
end 

用多行非平凡tex樣str來測試語法:

local test1 = [[\foo{text \bar[color = red, background = black]{ 
    moretext \baz{ 
even 
more text} } 


this time skipping multiple 

lines even, such wow!}]] 

主要生產在Lua表格式如下AST:

{ 
    command = "foo", 
    nodes = { 
    { 
     text = "text", 
     line = 1 
    }, 
    { 
     parameters = { 
     color = "red", 
     background = "black" 
     }, 
     command = "bar", 
     nodes = { 
     { 
      text = " moretext", 
      line = 2 
     }, 
     { 
      command = "baz", 
      nodes = { 
      { 
       text = "even ", 
       line = 3 
      }, 
      { 
       text = "more text", 
       line = 4 
      } 
      } 
     } 
     } 
    }, 
    { 
     text = "this time skipping multiple", 
     line = 7 
    }, 
    { 
     text = "lines even, such wow!", 
     line = 9 
    } 
    } 
} 

和第二測試的開始/結束環境:

local test2 = [[\begin[p1 
=apple, 
p2=blue]{scope} scope foobar 
\end{scope} global foobar]] 

這似乎給你左右要找的內容:

{ 
    { 
    { 
     text = " scope foobar", 
     line = 3 
    }, 
    parameters = { 
     p1 = "apple", 
     p2 = "blue" 
    }, 
    beginenv = "scope", 
    endenv = "scope" 
    }, 
    { 
    text = " global foobar", 
    line = 4 
    } 
}