2013-05-12 21 views
9

我試圖在Ruby中使用Parslet庫解析簡單的縮進敏感語法。在Ruby中使用Parslet的縮進敏感解析器?

以下是我試圖解析語法的例子:

level0child0 
level0child1 
    level1child0 
    level1child1 
    level2child0 
    level1child2 

結果樹看起來就像這樣:

[ 
    { 
    :identifier => "level0child0", 
    :children => [] 
    }, 
    { 
    :identifier => "level0child1", 
    :children => [ 
     { 
     :identifier => "level1child0", 
     :children => [] 
     }, 
     { 
     :identifier => "level1child1", 
     :children => [ 
      { 
      :identifier => "level2child0", 
      :children => [] 
      } 
     ] 
     }, 
     { 
     :identifier => "level1child2", 
     :children => [] 
     }, 
    ] 
    } 
] 

,我現在已經可以解析嵌套層次解析器0和1個節點,但不能解析過去那個:

require 'parslet' 

class IndentationSensitiveParser < Parslet::Parser 

    rule(:indent) { str(' ') } 
    rule(:newline) { str("\n") } 
    rule(:identifier) { match['A-Za-z0-9'].repeat.as(:identifier) } 

    rule(:node) { identifier >> newline >> (indent >> identifier >> newline.maybe).repeat.as(:children) } 

    rule(:document) { node.repeat } 

    root :document 

end 

require 'ap' 
require 'pp' 

begin 
    input = DATA.read 

    puts '', '----- input ----------------------------------------------------------------------', '' 
    ap input 

    tree = IndentationSensitiveParser.new.parse(input) 

    puts '', '----- tree -----------------------------------------------------------------------', '' 
    ap tree 

rescue IndentationSensitiveParser::ParseFailed => failure 
    puts '', '----- error ----------------------------------------------------------------------', '' 
    puts failure.cause.ascii_tree 
end 

__END__ 
user 
    name 
    age 
recipe 
    name 
foo 
bar 

很明顯,我需要一個dynami c計數器,期望3個縮進節點匹配嵌套級別3上的標識符。

如何以這種方式使用Parslet實現縮進敏感語法分析器?可能嗎?

+0

不知道這是做更好的解析/建立獨立的階段。幾乎任何縮進級別的組合都是有效的和解析的,所以對我來說,這指向一個非常簡單的基於行的解析器,它只是捕獲縮進級別,然後是解析器輸出並構建嵌套結構的東西。 – 2013-05-12 07:56:14

回答

13

有幾種方法。

  1. 解析通過識別每一行縮進的集合和一個標識符的文件,然後應用一個變換之後重構基於凹痕的數目的層次結構。

  2. 使用捕捉存儲當前縮進和期待下一個節點包括縮進加上更多匹配作爲一個孩子(我沒挖成這種方式多爲下一個發生在我身上)

  3. 規則只是方法。所以你可以將'node'定義爲一個方法,這意味着你可以傳遞參數! (如下圖)

這可讓您在node(depth+1)來定義node(depth)。然而,這種方法的問題是node方法不匹配字符串,它會生成一個解析器。所以遞歸調用永遠不會結束。

這就是爲什麼dynamic存在。它返回一個解析器,直到它嘗試匹配它時才解析,允許你現在遞歸而沒有問題。

請看下面的代碼:

require 'parslet' 

class IndentationSensitiveParser < Parslet::Parser 

    def indent(depth) 
    str(' '*depth) 
    end 

    rule(:newline) { str("\n") } 

    rule(:identifier) { match['A-Za-z0-9'].repeat(1).as(:identifier) } 

    def node(depth) 
    indent(depth) >> 
    identifier >> 
    newline.maybe >> 
    (dynamic{|s,c| node(depth+1).repeat(0)}).as(:children) 
    end 

    rule(:document) { node(0).repeat } 

    root :document 
end 

這是我的最佳解決方案。

+1

先生,你已經打動了我的思想。我一直在嘗試第二種方法。我的問題是,我沒有清楚地理解「動態」,因爲文檔沒有像我想的那樣深入研究這個主題。 非常感謝您的回答!這個解析器可能會爲我未來的其他許多人提供基礎:p – RyanScottLewis 2013-05-16 09:04:24

+0

非常感謝,我在https://github.com/alphagov/smartdown – heathd 2014-07-02 12:00:26

0

我不喜歡通過整個語法編織縮進過程的知識。我寧願讓INDENT和DEDENT標記產生,其他規則可以類似地使用來匹配「{」和「}」字符。所以以下是我的解決方案。它是一個IndentParser類,任何解析器都可以擴展以獲得nlindentdecent令牌生成。

require 'parslet' 

# Atoms returned from a dynamic that aren't meant to match anything. 
class AlwaysMatch < Parslet::Atoms::Base 
    def try(source, context, consume_all) 
    succ("") 
    end 
end 
class NeverMatch < Parslet::Atoms::Base 
    attr_accessor :msg 
    def initialize(msg = "ignore") 
    self.msg = msg 
    end 
    def try(source, context, consume_all) 
    context.err(self, source, msg) 
    end 
end 
class ErrorMatch < Parslet::Atoms::Base 
    attr_accessor :msg 
    def initialize(msg) 
    self.msg = msg 
    end 
    def try(source, context, consume_all) 
    context.err(self, source, msg) 
    end 
end 

class IndentParser < Parslet::Parser 

    ## 
    # Indentation handling: when matching a newline we check the following indentation. If 
    # that indicates an indent token or detent tokens (1+) then we stick these in a class 
    # variable and the high-priority indent/dedent rules will match as long as these 
    # remain. The nl rule consumes the indentation itself. 

    rule(:indent) { dynamic {|s,c| 
    if @indent.nil? 
     NeverMatch.new("Not an indent") 
    else 
     @indent = nil 
     AlwaysMatch.new 
    end 
    }} 
    rule(:dedent) { dynamic {|s,c| 
    if @dedents.nil? or @dedents.length == 0 
     NeverMatch.new("Not a dedent") 
    else 
     @dedents.pop 
     AlwaysMatch.new 
    end 
    }} 

    def checkIndentation(source, ctx) 
    # See if next line starts with indentation. If so, consume it and then process 
    # whether it is an indent or some number of dedents. 
    indent = "" 
    while source.matches?(Regexp.new("[ \t]")) 
     indent += source.consume(1).to_s #returns a Slice 
    end 

    if @indentStack.nil? 
     @indentStack = [""] 
    end 

    currentInd = @indentStack[-1] 
    return AlwaysMatch.new if currentInd == indent #no change, just match nl 

    if indent.start_with?(currentInd) 
     # Getting deeper 
     @indentStack << indent 
     @indent = indent #tells the indent rule to match one 
     return AlwaysMatch.new 
    else 
     # Either some number of de-dents or an error 

     # Find first match starting from back 
     count = 0 
     @indentStack.reverse.each do |level| 
     break if indent == level #found it, 

     if level.start_with?(indent) 
      # New indent is prefix, so we de-dented this level. 
      count += 1 
      next 
     end 

     # Not a match, not a valid prefix. So an error! 
     return ErrorMatch.new("Mismatched indentation level") 
     end 

     @dedents = [] if @dedents.nil? 
     count.times { @dedents << @indentStack.pop } 
     return AlwaysMatch.new 
    end 
    end 
    rule(:nl)   { anynl >> dynamic {|source, ctx| checkIndentation(source,ctx) }} 

    rule(:unixnl)  { str("\n") } 
    rule(:macnl)  { str("\r") } 
    rule(:winnl)  { str("\r\n") } 
    rule(:anynl)  { unixnl | macnl | winnl } 

end 

我相信很多東西都可以改進,但這是我到目前爲止所提出的。

用法示例:

class MyParser < IndentParser 
    rule(:colon)  { str(':') >> space? } 

    rule(:space)  { match(' \t').repeat(1) } 
    rule(:space?)  { space.maybe } 

    rule(:number)  { match['0-9'].repeat(1).as(:num) >> space? } 
    rule(:identifier) { match['a-zA-Z'] >> match["a-zA-Z0-9"].repeat(0) } 

    rule(:block)  { colon >> nl >> indent >> stmt.repeat.as(:stmts) >> dedent } 
    rule(:stmt)  { identifier.as(:id) >> nl | number.as(:num) >> nl | testblock } 
    rule(:testblock) { identifier.as(:name) >> block } 

    rule(:prgm)  { testblock >> nl.repeat } 
    root :prgm 
end 
+0

中使用了這個解決方案。我發現我的答案有問題。由於PEG在不考慮非確定性選擇的情況下貪婪地消耗下一場比賽,因此在塊內添加語句時,沒有任何內容會檢查縮進/縮進規則。因此,如果一個子塊後面跟着一個縮進和更多的語句,那麼這些塊將會被貪婪地添加到忽略該縮進的子塊中。要修復,要麼通過語法(yuck)編織相同的凹痕規則,要麼像插入詞法分析器那樣需要在輸入流中插入標記。 – webjprgm 2014-05-20 15:54:06