2012-05-10 45 views
1

我正在使用antlr編寫簡單的小寫字母式語法。它是Smalltalk的簡化版本,但基本思想是相同的(例如消息傳遞)。使用antlr - 一元減號和消息鏈接簡化的smalltalk語法

這是到目前爲止我的語法:

grammar GAL; 

options { 
    //k=2; 
    backtrack=true; 
} 

ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* 
    ; 

INT : '0'..'9'+ 
    ; 

FLOAT 
    : ('0'..'9')+ '.' ('0'..'9')* EXPONENT? 
    | '.' ('0'..'9')+ EXPONENT? 
    | ('0'..'9')+ EXPONENT 
    ; 

COMMENT 
    : '"' (options {greedy=false;} : .)* '"' {$channel=HIDDEN;} 
    ; 

WS : (' ' 
     | '\t' 
     ) {$channel=HIDDEN;} 
    ; 

NEW_LINE 
    : ('\r'?'\n') 
    ; 

STRING 
    : '\'' (ESC_SEQ | ~('\\'|'\''))* '\'' 
    ; 

fragment 
EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ; 

fragment 
HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ; 

fragment 
ESC_SEQ 
    : '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\') 
    | UNICODE_ESC 
    | OCTAL_ESC 
    ; 

fragment 
OCTAL_ESC 
    : '\\' ('0'..'3') ('0'..'7') ('0'..'7') 
    | '\\' ('0'..'7') ('0'..'7') 
    | '\\' ('0'..'7') 
    ; 

fragment 
UNICODE_ESC 
    : '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT 
    ; 

BINARY_MESSAGE_CHAR 
    : ('~' | '!' | '@' | '%' | '&' | '*' | '-' | '+' | '=' | '|' | '\\' | '<' | '>' | ',' | '?' | '/') 
     ('~' | '!' | '@' | '%' | '&' | '*' | '-' | '+' | '=' | '|' | '\\' | '<' | '>' | ',' | '?' | '/')? 
    ; 

// parser 

program 
    : NEW_LINE* (statement (NEW_LINE+ | EOF))* 
    ; 

statement 

    : message_sending 
    | return_statement 
    | assignment 
    | temp_variables 
    ; 

return_statement 
    : '^' statement 
    ; 

assignment 
    : identifier ':=' statement 
    ; 

temp_variables 
    : '|' identifier+ '|' 
    ; 

object 
    : raw_object 
    ; 

raw_object 
    : number 
    | string 
    | identifier 
    | literal 
    | block 
    | '(' message_sending ')' 
    ; 

message_sending 
    : keyword_message_sending 
    ; 

keyword_message_sending 
    : binary_message_sending keyword_message? 
    ; 

binary_message_sending 
    : unary_message_sending binary_message* 
    ; 

unary_message_sending 
    : object (unary_message)* 
    ; 

unary_message 
    : unary_message_selector 
    ; 

binary_message 
    : binary_message_selector unary_message_sending 
    ; 

keyword_message 
    : (NEW_LINE? single_keyword_message_selector NEW_LINE? binary_message_sending)+ 
    ; 

block 
    : 
     '[' (block_signiture 

    )? NEW_LINE* 
     block_body 

     NEW_LINE* ']' 
    ; 

block_body 
    : (statement 

    )? 
     (NEW_LINE+ statement 

    )* 
    ; 


block_signiture 
    : 
     (':' identifier 

    )+ '|' 
    ; 

unary_message_selector 
    : identifier 
    ; 

binary_message_selector 
    : BINARY_MESSAGE_CHAR 
    ; 

single_keyword_message_selector 
    : identifier ':' 
    ; 

keyword_message_selector 
    : single_keyword_message_selector+ 
    ; 

symbol 
    : '#' (string | identifier | binary_message_selector | keyword_message_selector) 
    ; 

literal 
    : symbol block? // if there is block then this is method 
    ; 

number 
    : /*'-'?*/ 
    (INT | FLOAT) 
    ; 

string 
    : STRING 
    ; 

identifier 
    : ID 
    ; 

1元負

我有換號一元減(對於規則number註釋部分)中的問題。問題是minus是有效的二進制消息。讓事情變得更糟兩個減號也是有效的二進制消息。我需要的是一元減去在沒有對象發送二進制消息的情況下(例如,-3 + 4應該是一元減號,因爲-3中沒有任何內容)。另外,(-3)也應該是二進制的。如果1 - -2將是帶參數-2的二進制消息' - ',那將是非常好的,但是我可以沒有它。我怎樣才能做到這一點?

如果我取消註釋一元減法,則在解析類似1-2的東西時會出現錯誤MismatchedSetException(0!= null)。

2.消息鏈接

什麼是實現消息在Smalltalk chainging最喜歡的方式是什麼?我的意思是這樣的:

obj message1 + 3; 
    message2; 
    + 3; 
    keyword: 2+3 

,每一個消息將被髮送到同一個對象,在這種情況下obj。消息優先級應保持不變(一元>二元>關鍵字)。

3回溯

大多數這種語法可以用k=2解析,但是當輸入是這樣的:

1 + 2 
Obj message: 
    1 + 2 
    message2: 'string' 

解析器嘗試匹配OBJ時single_keyword_message_selector和令牌提高UnwantedTokenExcaptionmessage。如果刪除k=2並設置backtrack=true(正如我所做的那樣),一切正常。我如何刪除回溯並獲得所需的行爲?

此外,大多數語法都可以使用k=1進行解析,所以我試圖設置k=2僅適用於需要它的規則,但忽略了這一點。我做了這樣的事情:

rule 
    options { k = 2; } 
    : // rule definition 
    ; 

,但它不能正常工作,直到我集合K在全局選項。我在這裏錯過了什麼?


更新

它不是從頭開始編寫語法理想的解決方案,因爲我有很多的依賴於它的代碼。此外,缺少的一些小問題的特徵 - 由設計缺失。這不是爲了實現另一個小竅門,小叮噹只是一個靈感。

我會更樂意在這種情況下有一元減工作:-1+22+(-1)。像2 -- -1這樣的案例並不重要。

此外,消息鏈接應該儘可能地簡單。這意味着我不喜歡改變我產生的AST的想法。

關於回溯 - 我可以忍受它,只是出於個人好奇而問。

這是生成AST的小修改語法 - 也許這將有助於更好地理解我不想更改的內容。 (temp_variables可能會被刪除,我沒有做出這個決定)。

grammar GAL; 

options { 
    //k=2; 
    backtrack=true; 
    language=CSharp3; 
    output=AST; 
} 

tokens { 
    HASH  = '#'; 
    COLON = ':'; 
    DOT  = '.'; 
    CARET = '^'; 
    PIPE  = '|'; 
    LBRACKET = '['; 
    RBRACKET = ']'; 
    LPAREN = '('; 
    RPAREN = ')'; 
    ASSIGN = ':='; 
} 

// generated files options 
@namespace { GAL.Compiler } 
@lexer::namespace { GAL.Compiler} 

// this will disable CLSComplaint warning in ANTLR generated code 
@parser::header { 
// Do not bug me about [System.CLSCompliant(false)] 
#pragma warning disable 3021 
} 

@lexer::header { 
// Do not bug me about [System.CLSCompliant(false)] 
#pragma warning disable 3021 
} 

ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* 
    ; 

INT : '0'..'9'+ 
    ; 

FLOAT 
    : ('0'..'9')+ '.' ('0'..'9')* EXPONENT? 
    | '.' ('0'..'9')+ EXPONENT? 
    | ('0'..'9')+ EXPONENT 
    ; 

COMMENT 
    : '"' (options {greedy=false;} : .)* '"' {$channel=Hidden;} 
    ; 

WS : (' ' 
     | '\t' 
     ) {$channel=Hidden;} 
    ; 

NEW_LINE 
    : ('\r'?'\n') 
    ; 

STRING 
    : '\'' (ESC_SEQ | ~('\\'|'\''))* '\'' 
    ; 

fragment 
EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ; 

fragment 
HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ; 

fragment 
ESC_SEQ 
    : '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\') 
    | UNICODE_ESC 
    | OCTAL_ESC 
    ; 

fragment 
OCTAL_ESC 
    : '\\' ('0'..'3') ('0'..'7') ('0'..'7') 
    | '\\' ('0'..'7') ('0'..'7') 
    | '\\' ('0'..'7') 
    ; 

fragment 
UNICODE_ESC 
    : '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT 
    ; 

BINARY_MESSAGE_CHAR 
    : ('~' | '!' | '@' | '%' | '&' | '*' | '-' | '+' | '=' | '|' | '\\' | '<' | '>' | ',' | '?' | '/') 
     ('~' | '!' | '@' | '%' | '&' | '*' | '-' | '+' | '=' | '|' | '\\' | '<' | '>' | ',' | '?' | '/')? 
    ; 

// parser 

public program returns [ AstProgram program ] 
    : { $program = new AstProgram(); } 
    NEW_LINE* 
    (statement (NEW_LINE+ | EOF) 
     { $program.AddStatement($statement.stmt); } 
    )* 
    ; 

statement returns [ AstNode stmt ] 
    : message_sending 
     { $stmt = $message_sending.messageSending; } 
    | return_statement 
     { $stmt = $return_statement.ret; } 
    | assignment 
     { $stmt = $assignment.assignment; } 
    | temp_variables 
     { $stmt = $temp_variables.tempVars; } 
    ; 

return_statement returns [ AstReturn ret ] 
    : CARET statement 
     { $ret = new AstReturn($CARET, $statement.stmt); } 
    ; 

assignment returns [ AstAssignment assignment ] 
    : dotted_expression ASSIGN statement 
     { $assignment = new AstAssignment($dotted_expression.dottedExpression, $ASSIGN, $statement.stmt); } 
    ; 

temp_variables returns [ AstTempVariables tempVars ] 
    : p1=PIPE 
     { $tempVars = new AstTempVariables($p1); } 
    (identifier 
     { $tempVars.AddVar($identifier.identifier); } 
    )+ 
    p2=PIPE 
     { $tempVars.EndToken = $p2; } 
    ; 

object returns [ AstNode obj ] 
    : number 
     { $obj = $number.number; } 
    | string 
     { $obj = $string.str; } 
    | dotted_expression 
     { $obj = $dotted_expression.dottedExpression; } 
    | literal 
     { $obj = $literal.literal; } 
    | block 
     { $obj = $block.block; } 
    | LPAREN message_sending RPAREN 
     { $obj = $message_sending.messageSending; } 
    ; 

message_sending returns [ AstKeywordMessageSending messageSending ] 
    : keyword_message_sending 
     { $messageSending = $keyword_message_sending.keywordMessageSending; } 
    ; 

keyword_message_sending returns [ AstKeywordMessageSending keywordMessageSending ] 
    : binary_message_sending 
     { $keywordMessageSending = new AstKeywordMessageSending($binary_message_sending.binaryMessageSending); } 
    (keyword_message 
     { $keywordMessageSending = $keywordMessageSending.NewMessage($keyword_message.keywordMessage); } 
    )? 
    ; 

binary_message_sending returns [ AstBinaryMessageSending binaryMessageSending ] 
    : unary_message_sending 
     { $binaryMessageSending = new AstBinaryMessageSending($unary_message_sending.unaryMessageSending); } 
    (binary_message 
     { $binaryMessageSending = $binaryMessageSending.NewMessage($binary_message.binaryMessage); } 
    )* 
    ; 

unary_message_sending returns [ AstUnaryMessageSending unaryMessageSending ] 
    : object 
     { $unaryMessageSending = new AstUnaryMessageSending($object.obj); } 
    (
     unary_message 
     { $unaryMessageSending = $unaryMessageSending.NewMessage($unary_message.unaryMessage); } 
    )* 
    ; 

unary_message returns [ AstUnaryMessage unaryMessage ] 
    : unary_message_selector 
     { $unaryMessage = new AstUnaryMessage($unary_message_selector.unarySelector); } 
    ; 

binary_message returns [ AstBinaryMessage binaryMessage ] 
    : binary_message_selector unary_message_sending 
     { $binaryMessage = new AstBinaryMessage($binary_message_selector.binarySelector, $unary_message_sending.unaryMessageSending); } 
    ; 

keyword_message returns [ AstKeywordMessage keywordMessage ] 
    : 
    { $keywordMessage = new AstKeywordMessage(); } 
    (
     NEW_LINE? 
     single_keyword_message_selector 
     NEW_LINE? 
     binary_message_sending 
     { $keywordMessage.AddMessagePart($single_keyword_message_selector.singleKwSelector, $binary_message_sending.binaryMessageSending); } 
    )+ 
    ; 

block returns [ AstBlock block ] 
    : LBRACKET 
     { $block = new AstBlock($LBRACKET); } 
    (
     block_signiture 
     { $block.Signiture = $block_signiture.blkSigniture; } 
    )? NEW_LINE* 
     block_body 
     { $block.Body = $block_body.blkBody; } 
     NEW_LINE* 
     RBRACKET 
     { $block.SetEndToken($RBRACKET); } 
    ; 

block_body returns [ IList<AstNode> blkBody ] 
    @init { $blkBody = new List<AstNode>(); } 
    : 
    (s1=statement 
     { $blkBody.Add($s1.stmt); } 
    )? 
    (NEW_LINE+ s2=statement 
     { $blkBody.Add($s2.stmt); } 
    )* 
    ; 


block_signiture returns [ AstBlockSigniture blkSigniture ] 
    @init { $blkSigniture = new AstBlockSigniture(); } 
    : 
    (COLON identifier 
     { $blkSigniture.AddIdentifier($COLON, $identifier.identifier); } 
    )+ PIPE 
     { $blkSigniture.SetEndToken($PIPE); } 
    ; 

unary_message_selector returns [ AstUnaryMessageSelector unarySelector ] 
    : identifier 
     { $unarySelector = new AstUnaryMessageSelector($identifier.identifier); } 
    ; 

binary_message_selector returns [ AstBinaryMessageSelector binarySelector ] 
    : BINARY_MESSAGE_CHAR 
     { $binarySelector = new AstBinaryMessageSelector($BINARY_MESSAGE_CHAR); } 
    ; 

single_keyword_message_selector returns [ AstIdentifier singleKwSelector ] 
    : identifier COLON 
     { $singleKwSelector = $identifier.identifier; } 
    ; 

keyword_message_selector returns [ AstKeywordMessageSelector keywordSelector ] 
    @init { $keywordSelector = new AstKeywordMessageSelector(); } 
    : 
    (single_keyword_message_selector 
     { $keywordSelector.AddIdentifier($single_keyword_message_selector.singleKwSelector); } 
    )+ 
    ; 

symbol returns [ AstSymbol symbol ] 
    : HASH 
    (string 
     { $symbol = new AstSymbol($HASH, $string.str); } 
    | identifier 
     { $symbol = new AstSymbol($HASH, $identifier.identifier); } 
    | binary_message_selector 
     { $symbol = new AstSymbol($HASH, $binary_message_selector.binarySelector); } 
    | keyword_message_selector 
     { $symbol = new AstSymbol($HASH, $keyword_message_selector.keywordSelector); } 
    ) 
    ; 

literal returns [ AstNode literal ] 
    : symbol 
     { $literal = $symbol.symbol; } 
    (block 
     { $literal = new AstMethod($symbol.symbol, $block.block); } 
    )? // if there is block then this is method 
    ; 

number returns [ AstNode number ] 
    : /*'-'?*/ 
    (INT 
     { $number = new AstInt($INT); } 
    | FLOAT 
     { $number = new AstInt($FLOAT); } 
    ) 
    ; 

string returns [ AstString str ] 
    : STRING 
     { $str = new AstString($STRING); } 
    ; 

dotted_expression returns [ AstDottedExpression dottedExpression ] 
    : i1=identifier 
     { $dottedExpression = new AstDottedExpression($i1.identifier); } 
    (DOT i2=identifier 
     { $dottedExpression.AddIdentifier($i2.identifier); } 
    )* 
    ; 

identifier returns [ AstIdentifier identifier ] 
    : ID 
     { $identifier = new AstIdentifier($ID); } 
    ; 

回答

1

嗨Smalltalk的語法作家,

首先,要獲得Smalltalk的語法正確解析(1 - -2),並支持可選的 ''在最後的陳述等,你應該把空白視爲重要的。不要把它放在隱藏的頻道上。

到目前爲止,語法不會將規則分解成足夠小的片段。這將是一個問題,就像你看到K = 2和回溯一樣。

我建議你ANTLR取出一個工作Smalltalk的語法由紅線Smalltalk的項目http://redline.st & https://github.com/redline-smalltalk/redline-smalltalk

RGS,詹姆斯的定義。

+0

Smalltalk運行時/口譯員有很多不同嗎?我問,因爲GNU Smalltalk版本3.2.4不接受'(1 - -2)printNl'。它*確實*接受'(1 - 2)printNl'(最後有或沒有'.')。 –

+0

關於[你的語法]還有一件事(https://github.com/redline-smalltalk/redline-smalltalk/blob/master/src/main/antlr3/st/redline/compiler/Smalltalk.g):因爲你'我們在'number'分析器規則中輸入了字符'r',輸入如'| r | R:= 42。 r printNl.'將導致''r'' *** not ***被標記爲'ID'標記,但是作爲一個字面的''r''標記。不是你想要的東西,對吧?或者我錯過了什麼? –

+0

感謝您的回答。從根本上改變語法並不是理想的解決方案,因爲我有需要改變的AST,而這只是現在的工作。我正在尋找簡單的方法來添加一元減號(1 - -2只是一個想法,我會很高興有-1 + 2或2 - ( - 3)工作。 –