2012-08-14 26 views
2

語句我使用Lua.g語法在Java ANTLR產生一個Lua解析器來解析LUA。我想基本上解析使用ANTLR IF中的特定功能

我Lua代碼看起來像這樣

function uniqueid_some_event (e) 
if (e:HasString("string1")) then 
    -- do something 
end 
if (e:HasString("string2")) then 
    -- do something 
end 
end 

我有這些事件在不同的文件中對特定演員的綁定百元。

現在我想分析這些文件,並收集每個什麼條件檢查 - 在上述情況下 - 我想提取「字符串1」和「字符串2」的事件觸發。 (更確切地說,我想創建一個報告,將顯示每個文件的觸發器)

我收集我需要以某種方式修改Lua.g添加我自己的邏輯,但我失去了,因爲我沒有找到任何文件 - 我看着LuaEclise基本上做了一些事情,但它不適用於我。

所以 - 這可能添加到產生LuaParser某種W3C DOM返回值?或者像getFunctions()這樣會返回找到的所有函數,並且在每個函數getHasStringStatements()中返回條件?

+0

感謝 - 我想我明白瞭如何在ANTLR嵌入代碼的工作 - 我還認爲用這個簡單的例子很容易就可以完成,並且自己從這裏進一步發展 - 正如我所說的,我試着用luaeclipse修改我的運氣 - 我已經可以獲得IF語句的列表 - 但是它們不包含它檢查的信息 - 這裏是語法文件http://lunareclipse.svn.source forge.net/viewvc/lunareclipse/trunk/net.sf.lunareclipse.core/src/net/sf/lunareclipse/internal/core/parsers/Lua.g?revision=183&view=markup – Steve 2012-08-15 19:29:41

+0

我相信我需要修改我= '如果' COND = EXP '然後' 行動=塊{語句=新IfStatement(toDLTK(i)中,COND,動作); T =(IfStatement)語句;} - 但林不某些有關ANTLR語法 - 我將如何在最少收集IF支架之間的弦? – Steve 2012-08-15 19:31:37

+0

我發佈了一種可能的解決方法(不是使用現在使用的語法,而是使用內聯語法代碼)。 – 2012-08-16 18:39:56

回答

5

我不推薦使用的Lua語法從ANTLR wiki。 AFAIK,它有很多錯誤(沒有適當的長字符串和長註釋,無效的數字/十六進制標記,全局回溯等等)。

下面是Lua的5.2(IMO)更好的語法:

/* 
Copyright (c) 2011-2012 by Bart Kiers 

Permission is hereby granted, free of charge, to any person 
obtaining a copy of this software and associated documentation 
files (the "Software"), to deal in the Software without 
restriction, including without limitation the rights to use, 
copy, modify, merge, publish, distribute, sublicense, and/or sell 
copies of the Software, and to permit persons to whom the 
Software is furnished to do so, subject to the following 
conditions: 

The above copyright notice and this permission notice shall be 
included in all copies or substantial portions of the Software. 

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES 
OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND 
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 
HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, 
WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR 
OTHER DEALINGS IN THE SOFTWARE. 
*/ 
grammar Lua52; 

options { 
output=AST; 
ASTLabelType=CommonTree; 
} 

tokens { 
    // literals 
    And  = 'and'; 
    Break  = 'break'; 
    Do  = 'do'; 
    Else  = 'else'; 
    Elseif = 'elseif'; 
    End  = 'end'; 
    False  = 'false'; 
    For  = 'for'; 
    Function = 'function'; 
    Goto  = 'goto'; 
    If  = 'if'; 
    In  = 'in'; 
    Local  = 'local'; 
    Nil  = 'nil'; 
    Not  = 'not'; 
    Or  = 'or'; 
    Repeat = 'repeat'; 
    Return = 'return'; 
    Then  = 'then'; 
    True  = 'true'; 
    Until  = 'until'; 
    While  = 'while'; 
    Add  = '+'; 
    Minus  = '-'; 
    Mult  = '*'; 
    Div  = '/'; 
    Mod  = '%'; 
    Pow  = '^'; 
    Length = '#'; 
    Eq  = '=='; 
    NEq  = '~='; 
    LTEq  = '<='; 
    GTEq  = '>='; 
    LT  = '<'; 
    GT  = '>'; 
    Assign = '='; 
    OPar  = '('; 
    CPar  = ')'; 
    OBrace = '{'; 
    CBrace = '}'; 
    OBrack = '['; 
    CBrack = ']'; 
    ColCol = '::'; 
    SCol  = ';'; 
    Col  = ':'; 
    Comma  = ','; 
    DotDotDot = '...'; 
    DotDot = '..'; 
    Dot  = '.'; 

    // imaginary tokens 
    ASSIGNMENT; 
    LOCAL_ASSIGNMENT; 
    CONDITION; 
    UNARY_MINUS; 
    CALL; 
    COL_CALL; 
    INDEX; 
    EXPR_LIST; 
    VAR_LIST; 
    CHUNK; 
    NAME_LIST; 
    LABEL; 
    TABLE; 
    FIELD_LIST; 
    FIELD; 
    FOR_IN; 
    PARAM_LIST; 
    FUNCTION; 
    ASSIGNMENT_VAR; 
    VAR; 
} 

@parser::header { 
    package luja.parser; 
} 

@lexer::header { 
    package luja.parser; 
    import java.math.*; 
} 

@parser::members { 

    private boolean addSelf = false; 

    private CommonTree createPowAST(List tokens) { 
    int n = tokens.size(); 

    CommonTree ast = new CommonTree(new CommonToken(Pow, "^")); 
    ast.addChild((CommonTree)tokens.get(n - 2)); 
    ast.addChild((CommonTree)tokens.get(n - 1)); 

    for(int i = n - 3; i >= 0; i--) { 
     CommonTree temp = new CommonTree(new CommonToken(Pow, "^")); 
     temp.addChild((CommonTree)tokens.get(i)); 
     temp.addChild(ast); 
     ast = temp; 
    } 

    return ast; 
    } 

    private CommonTree namesToVar(List<String> names, String name) { 
    names.add(name); 
    return namesToVar(names); 
    } 

    private CommonTree namesToVar(List<String> names) { 

    if(names.size() == 1) { 
     return new CommonTree(new CommonToken(Name, names.get(0))); 
    } 

    CommonTree ast = new CommonTree(new CommonToken(VAR, "VAR")); 

    ast.addChild(new CommonTree(new CommonToken(Name, names.get(0)))); 

    for(int i = 1; i < names.size(); i++) { 
     CommonTree indexNode = new CommonTree(new CommonToken(INDEX, "INDEX")); 
     indexNode.addChild(new CommonTree(new CommonToken(Name, names.get(i)))); 
     ast.addChild(indexNode); 
    } 

    return ast; 
    } 

    @Override 
    public void reportError(RecognitionException e) { 
    throw new RuntimeException(e); 
    } 
} 

@lexer::members { 

    private boolean ahead(CharSequence chars) { 
    for(int i = 0; i < chars.length(); i++) { 
     if(input.LA(i + 1) != chars.charAt(i)) {   
     return false; 
     } 
    } 
    return true; 
    } 

    @Override 
    public void reportError(RecognitionException e) { 
    throw new RuntimeException(e); 
    } 

    private String unescape(String text) { 
    StringBuilder b = new StringBuilder(); 
    String regex = "\\\\([\\\\abfnrtv\"']|\r?\n|\r|\\d{1,3}|x[0-9a-fA-F]{2}|z\\s*)|(?s)."; 
    java.util.regex.Matcher m = java.util.regex.Pattern.compile(regex).matcher(text); 
    while(m.find()) { 
     if(m.group(1) != null) { 
     // an escaped char 
     String matched = m.group(1); 
     if(matched.equals("\\")) b.append("\\"); 
     else if(matched.equals("a")) b.append("\u0007"); 
     else if(matched.equals("b")) b.append("\u0008"); 
     else if(matched.equals("f")) b.append("\u000C"); 
     else if(matched.equals("n")) b.append("\n"); 
     else if(matched.equals("r")) b.append("\r"); 
     else if(matched.equals("t")) b.append("\t"); 
     else if(matched.equals("v")) b.append("\u000B"); 
     else if(matched.equals("\"")) b.append("\""); 
     else if(matched.equals("'")) b.append("'"); 
     else if(matched.matches("\r?\n|\r")) b.append(matched); 
     else if(matched.matches("\\d{1,3}")) b.append((char)Integer.parseInt(matched)); 
     else if(matched.matches("x[0-9a-fA-F]{2}")) b.append((char)Integer.parseInt(matched.substring(1), 16)); 
     else if(matched.equals("z\\s*")) { /* do nothing, remove from string */ } 
     } 
     else { 
     // a normal char, append "as is" 
     b.append(m.group()); 
     } 
    } 
    return b.toString(); 
    } 
} 

//////////////////////////////// parser rules //////////////////////////////// 
parse 
: chunk EOF -> chunk 
; 

chunk 
: stat* ret_stat? -> ^(CHUNK stat* ret_stat?) 
; 

stat 
: (assignment)=> assignment 
| var[false]       // must be a function call, not an index: check and throw exception 
| do_block 
| while_stat 
| repeat_stat 
| local 
| goto_stat 
| if_stat 
| for_stat 
| function 
| label 
| Break 
| ';' -> /* remove from AST (empty rewrite rule) */ 
; 

do_block 
: Do chunk End -> ^(Do chunk) 
; 

while_stat 
: While expr do_block -> ^(While expr do_block) 
; 

repeat_stat 
: Repeat chunk Until expr -> ^(Repeat chunk expr) 
; 

assignment 
: var_list '=' expr_list // in every 'var' in 'var_list', the last must be an 'index', not a 'call' 
    -> ^(ASSIGNMENT ^(VAR_LIST var_list) ^(EXPR_LIST expr_list)) 
; 

local 
: Local (name_list '=' expr_list -> ^(LOCAL_ASSIGNMENT ^(NAME_LIST name_list) ^(EXPR_LIST expr_list)) 
     | Function Name func_body -> ^(LOCAL_ASSIGNMENT ^(NAME_LIST Name) ^(EXPR_LIST func_body)) 
     ) 
; 

goto_stat 
: Goto Name -> ^(Goto Name) 
; 

if_stat 
: If expr Then chunk elseif_stat* else_stat? End -> ^(If ^(CONDITION expr chunk) elseif_stat* else_stat?) 
; 

elseif_stat 
: Elseif expr Then chunk -> ^(CONDITION expr chunk) 
; 

else_stat 
: Else chunk -> ^(CONDITION True chunk) 
; 

for_stat 
: For (Name '=' a=expr ',' b=expr (',' c=expr)? do_block -> ^(For Name $a $b $c? do_block) 
     | name_list In expr_list do_block     -> ^(FOR_IN ^(NAME_LIST name_list) ^(EXPR_LIST expr_list) do_block) 
     ) 
; 

function 
: Function names (Col Name {addSelf=true;} func_body {addSelf=false;} 
        -> ^(ASSIGNMENT ^(VAR_LIST {namesToVar($names.list, $Name.text)}) ^(EXPR_LIST func_body)) 
        | func_body 
        -> ^(ASSIGNMENT ^(VAR_LIST {namesToVar($names.list)}) ^(EXPR_LIST func_body)) 
       ) 
; 

names returns [List<String> list] 
@init{$list = new ArrayList<String>();} 
: a=Name {$list.add($a.text);} ('.' b=Name {$list.add($b.text);})* 
; 

function_literal 
: Function func_body -> func_body 
; 

func_body 
: '(' param_list ')' chunk End -> ^(FUNCTION param_list chunk) 
; 

param_list 
: name_list (',' DotDotDot)? -> ^(PARAM_LIST name_list DotDotDot?) 
| DotDotDot?     -> ^(PARAM_LIST DotDotDot?) 
; 

ret_stat 
: Return expr_list? ';'? -> ^(Return expr_list?) 
; 

expr 
: or_expr 
; 

or_expr 
: and_expr (Or^ and_expr)* 
; 

and_expr 
: rel_expr (And^ rel_expr)* 
; 

rel_expr 
: concat_expr ((LT | GT | LTEq | GTEq | NEq | Eq)^ concat_expr)? 
; 

concat_expr 
: add_expr (DotDot^ add_expr)* 
; 

add_expr 
: mult_expr ((Add | Minus)^ mult_expr)* 
; 

mult_expr 
: unary_expr ((Mult | Div | Mod)^ unary_expr)* 
; 

unary_expr 
: Minus unary_expr -> ^(UNARY_MINUS unary_expr) 
| Length pow_expr -> ^(Length pow_expr) 
| Not unary_expr -> ^(Not unary_expr) 
| pow_expr 
; 

// right associative 
pow_expr 
// : (a=atom -> $a) ((Pow atom)+ -> ^(Pow atom+))? 
: (a+=atom -> $a) ((Pow a+=atom)+ -> {createPowAST($a)})? 
; 

atom 
: var[false] 
| function_literal 
| table_constructor 
| DotDotDot 
| Number 
| String 
| Nil 
| True 
| False 
; 

var[boolean assign] 
: (callee[assign] -> callee) ((tail)=> (((tail)=> t=tail)+ -> {assign}? ^(ASSIGNMENT_VAR callee tail+) 
                  ->   ^(VAR callee tail+)) 
          )? 
; 

callee[boolean assign] 
: '(' expr ')' -> expr 
| Name 
; 

tail 
: '.' Name     -> ^(INDEX String[$Name.text]) 
| '[' expr ']'    -> ^(INDEX expr) 
| ':' Name '(' expr_list? ')' -> ^(INDEX {new CommonTree(new CommonToken(String, $Name.text))}) ^(COL_CALL expr_list?) 
| ':' Name table_constructor -> ^(INDEX {new CommonTree(new CommonToken(String, $Name.text))}) ^(COL_CALL table_constructor) 
| ':' Name String    -> ^(INDEX {new CommonTree(new CommonToken(String, $Name.text))}) ^(COL_CALL String) 
| '(' expr_list? ')'   -> ^(CALL expr_list?) 
| table_constructor   -> ^(CALL table_constructor) 
| String      -> ^(CALL String) 
; 

table_constructor 
: '{' field_list? '}' -> ^(TABLE field_list?) 
; 

field_list 
: field (field_sep field)* field_sep? -> field+ 
; 

field 
: '[' expr ']' '=' expr -> ^(FIELD expr expr) 
| Name '=' expr   -> ^(FIELD {new CommonTree(new CommonToken(String, $Name.text))} expr) 
| expr     -> ^(FIELD expr) 
; 

field_sep 
: ',' 
| ';' 
; 

label 
: '::' Name '::' -> ^(LABEL Name) 
; 

var_list 
: var[true] (',' var[true])* -> var+ 
; 

expr_list 
: expr (',' expr)* -> expr+ 
; 

name_list 
: Name (',' Name)* -> {addSelf}? {new CommonTree(new CommonToken(Name, "self"))} Name+ 
        ->   Name+ 
; 

//////////////////////////////// lexer rules //////////////////////////////// 
Name 
: (Letter | '_') (Letter | '_' | Digit)* 
; 

Number 
: (Digit+ ('.' Digit*)? Exponent? | '.' Digit+ Exponent?) {setText(new java.math.BigDecimal($text).toPlainString().replaceAll("\\.0*$", ""));} 
| '0' ('x' | 'X') a=HexDigits ('.' b=HexDigits?)? c=BinaryExponent? 
    { 
    double num = Long.parseLong($a.text, 16); 

    if($b != null) { 
     double fraction = Long.parseLong($b.text, 16)/Math.pow(16, $b.text.length()); 
     num += fraction; 
    } 

    if($c != null) { 
     int binExp = Integer.valueOf($c.text.contains("+") ? $c.text.substring(2) : $c.text.substring(1)); 
     for(int i = 0; i < Math.abs(binExp); i++) { 
     num = binExp < 0 ? num/2 : num*2; 
     } 
    } 

    setText(new BigDecimal(Double.toString(num)).toPlainString().replaceAll("\\.0*$", "")); 
    } 
; 

String 
: '"' (EscapeSequence | ~('\\' | '"' | '\r' | '\n'))* '"' {setText(unescape($text.substring(1, $text.length()-1)));} 
| '\'' (EscapeSequence | ~('\\' | '\'' | '\r' | '\n'))* '\'' {setText(unescape($text.substring(1, $text.length()-1)));} 
| LongBracket            {setText($text.replaceAll("^\\[=*\\[|]=*]$", ""));} 
; 

//////////////////////////////// lexer rules to skip //////////////////////////////// 
Comment 
: '--' (LongBracket 
     | '[' '='* ~('=' | '[') ~('\r' | '\n')* // matches '--[=====...' as a single line comment 
     | (~'[' ~('\r' | '\n')*)? 
     ) 
     {skip();} 
; 

Space 
: (' ' | '\t' | '\r' | '\n' | '\u000C')+ {skip();} 
; 

//////////////////////////////// fragment lexer rules //////////////////////////////// 
fragment Letter 
: 'a'..'z' 
| 'A'..'Z' 
; 

fragment Digit 
: '0'..'9' 
; 

fragment HexDigit 
: Digit 
| 'a'..'f' 
| 'A'..'F' 
; 

fragment HexDigits 
: HexDigit+ 
; 

fragment Exponent 
: ('e' | 'E') ('-' | '+')? Digit+ 
; 

fragment BinaryExponent 
: ('p' | 'P') ('-' | '+')? Digit+ 
; 

fragment EscapeSequence 
: '\\' (('a' | 'b' | 'f' | 'n' | 'r' | 't' | 'v' | '\\' | '"' | '\'' | 'z' | LineBreak) 
     | Digit (Digit Digit?)? 
     | 'x' HexDigit HexDigit 
     ) 
; 

fragment LineBreak 
: '\r'? '\n' 
| '\r' 
; 

fragment LongBracket 
@init{StringBuilder b = new StringBuilder("]");} 
: 
    // match opening bracket and build equal sized closing bracket 
    '[' ('=' {b.append("=");})* '[' {b.append("]");} 

    // keep matching chars until the closing bracket is ahead 
    ({!ahead(b)}?=> (~'\\' | EscapeSequence))* 

    { 
    if(input.LA(1) == EOF) { 
     throw new RuntimeException("unfinished long comment or string near '<eof>'"); 
    } 

    // let the lexer match the closing bracket 
    match(b.toString()); 
    } 

; 

//////////////////////////////// a fall through rule throwing an exception //////////////////////////////// 
Any 
@after {throw new RuntimeException("unexpected symbol near: '" + $text + "'");} 
: . 
; 

如果你現在解析以下輸入:

--[===[ 
    function uniqueid_some_event (e) 
    if (e:HasString("ignore string1")) then 
     -- do something 
    end 
    if (e:HasString("ignore string2")) then 
     -- do something 
    end 
    end 
    some invalid closing comment tags: ]==] ]====] 
]===] 

function uniqueid_some_event (e) 
if (e:HasString("string1")) then 
    -- do something 
end 
if (e:HasString("string2")) then 
    -- do something 
end 
end 

if (e:HasString("outside function...")) then end 

,你會得到下面的AST從生成的解析器返回:

enter image description here

現在你只需要d o走過AST,當你偶然發現一個ASSIGNMENT節點時,檢查右邊的孩子是否是一個表達式列表,其中包含FUNCTION。如果發生這種情況,請遍歷此節點以查找其中包含字符串表達式的if語句。

這裏是一個開始:

public class LUjATest { 

    private static void findFunctions(CommonTree tree) { 
     if (tree == null) return; 

     if(tree.getType() == Lua52Parser.ASSIGNMENT) { 

      String name = tree.getChild(0).getChild(0).getText(); 
      CommonTree expressions = (CommonTree) tree.getChild(1); 

      if(expressions.getChildCount() > 0 && 
        expressions.getChild(0).getType() == Lua52Parser.FUNCTION) { 

       System.out.println("walk the tree:\n " + expressions.toStringTree() + 
         "\nto find all strings for event: '" + name + "'"); 
      } 
     } 
     else { 
      for (int i = 0; i < tree.getChildCount(); i++) { 
       findFunctions((CommonTree) tree.getChild(i)); 
      } 
     } 
    } 

    public static void main(String[] args) throws Exception { 
     Lua52Lexer lexer = new Lua52Lexer(new ANTLRFileStream("src/lua/test.lua")); 
     Lua52Parser parser = new Lua52Parser(new CommonTokenStream(lexer)); 
     CommonTree tree = (CommonTree) parser.parse().getTree(); 
     findFunctions(tree); 
    } 
} 

當運行上面的類,你會看到下面的輸出:

walk the tree: 
    (EXPR_LIST (FUNCTION (PARAM_LIST e) (CHUNK (if (CONDITION (VAR e (INDEX HasString) (COL_CALL string1)) CHUNK)) (if (CONDITION (VAR e (INDEX HasString) (COL_CALL string2)) CHUNK))))) 
to find all strings for event: 'uniqueid_some_event'
+0

這是非常優秀的,而且已經是一個完全沒有預期的解決方案 - 非常感謝你。我必須說這個語法看起來相當複雜,我非常感謝這個出發點 - 我沒有意識到你必須深入到ANTLR來開發這樣一個基本的解析任務。 – Steve 2012-08-17 14:11:55

+0

最後 - 爲了更好的理解 - 當我更改第二個if elseif時 - 樹不是我所期望的,例如, (EXPR_LIST(FUNCTION(PARAM_LIST e))(CHUNK(if(CONDITION(VAR e(INDEX HasString)(COL_CALL string1))(CHUNK(VAR print(CALL hello))))(CONDITION(VAR e(INDEX HasString)(COL_CALL string2 ))(CHUNK(VAR print(CALL hello2)))))))) - 這裏只有一個IF樹,其中包含if和else - 如果我添加更多elseif它也包含在一個如果 - 我基本上認爲它將工作相同 - 遍歷所有孩子,並檢查它是否是一個IF/ELSE語句 – Steve 2012-08-17 14:13:08

+0

@Steve,不,結構是正確的,至少,作者是這麼想的(這是我:))。在'if ... end if ... end'的情況下,會創建兩個「if」節點,因爲它們是不同的語句。然而,對於輸入「if ... elseif ... end」,只創建1個「if」節點,但該節點包含2個「CONDITION」節點(1表示「if」,1表示「elseif ')畢竟,後者是1語句(只有'if'或'elseif'可以被執行)。 – 2012-08-17 17:34:58