我正在使用antlr編寫簡單的小寫字母式語法。它是Smalltalk的簡化版本,但基本思想是相同的(例如消息傳遞)。使用antlr - 一元減號和消息鏈接簡化的smalltalk語法
這是到目前爲止我的語法:
grammar GAL;
options {
//k=2;
backtrack=true;
}
ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
;
INT : '0'..'9'+
;
FLOAT
: ('0'..'9')+ '.' ('0'..'9')* EXPONENT?
| '.' ('0'..'9')+ EXPONENT?
| ('0'..'9')+ EXPONENT
;
COMMENT
: '"' (options {greedy=false;} : .)* '"' {$channel=HIDDEN;}
;
WS : (' '
| '\t'
) {$channel=HIDDEN;}
;
NEW_LINE
: ('\r'?'\n')
;
STRING
: '\'' (ESC_SEQ | ~('\\'|'\''))* '\''
;
fragment
EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;
fragment
HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;
fragment
ESC_SEQ
: '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
| UNICODE_ESC
| OCTAL_ESC
;
fragment
OCTAL_ESC
: '\\' ('0'..'3') ('0'..'7') ('0'..'7')
| '\\' ('0'..'7') ('0'..'7')
| '\\' ('0'..'7')
;
fragment
UNICODE_ESC
: '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
;
BINARY_MESSAGE_CHAR
: ('~' | '!' | '@' | '%' | '&' | '*' | '-' | '+' | '=' | '|' | '\\' | '<' | '>' | ',' | '?' | '/')
('~' | '!' | '@' | '%' | '&' | '*' | '-' | '+' | '=' | '|' | '\\' | '<' | '>' | ',' | '?' | '/')?
;
// parser
program
: NEW_LINE* (statement (NEW_LINE+ | EOF))*
;
statement
: message_sending
| return_statement
| assignment
| temp_variables
;
return_statement
: '^' statement
;
assignment
: identifier ':=' statement
;
temp_variables
: '|' identifier+ '|'
;
object
: raw_object
;
raw_object
: number
| string
| identifier
| literal
| block
| '(' message_sending ')'
;
message_sending
: keyword_message_sending
;
keyword_message_sending
: binary_message_sending keyword_message?
;
binary_message_sending
: unary_message_sending binary_message*
;
unary_message_sending
: object (unary_message)*
;
unary_message
: unary_message_selector
;
binary_message
: binary_message_selector unary_message_sending
;
keyword_message
: (NEW_LINE? single_keyword_message_selector NEW_LINE? binary_message_sending)+
;
block
:
'[' (block_signiture
)? NEW_LINE*
block_body
NEW_LINE* ']'
;
block_body
: (statement
)?
(NEW_LINE+ statement
)*
;
block_signiture
:
(':' identifier
)+ '|'
;
unary_message_selector
: identifier
;
binary_message_selector
: BINARY_MESSAGE_CHAR
;
single_keyword_message_selector
: identifier ':'
;
keyword_message_selector
: single_keyword_message_selector+
;
symbol
: '#' (string | identifier | binary_message_selector | keyword_message_selector)
;
literal
: symbol block? // if there is block then this is method
;
number
: /*'-'?*/
(INT | FLOAT)
;
string
: STRING
;
identifier
: ID
;
1元負
我有換號一元減(對於規則number
註釋部分)中的問題。問題是minus是有效的二進制消息。讓事情變得更糟兩個減號也是有效的二進制消息。我需要的是一元減去在沒有對象發送二進制消息的情況下(例如,-3 + 4應該是一元減號,因爲-3中沒有任何內容)。另外,(-3)也應該是二進制的。如果1 - -2將是帶參數-2的二進制消息' - ',那將是非常好的,但是我可以沒有它。我怎樣才能做到這一點?
如果我取消註釋一元減法,則在解析類似1-2的東西時會出現錯誤MismatchedSetException(0!= null)。
2.消息鏈接
什麼是實現消息在Smalltalk chainging最喜歡的方式是什麼?我的意思是這樣的:
obj message1 + 3;
message2;
+ 3;
keyword: 2+3
,每一個消息將被髮送到同一個對象,在這種情況下obj
。消息優先級應保持不變(一元>二元>關鍵字)。
3回溯
大多數這種語法可以用k=2
解析,但是當輸入是這樣的:
1 + 2
Obj message:
1 + 2
message2: 'string'
解析器嘗試匹配OBJ時single_keyword_message_selector
和令牌提高UnwantedTokenExcaption
message
。如果刪除k=2
並設置backtrack=true
(正如我所做的那樣),一切正常。我如何刪除回溯並獲得所需的行爲?
此外,大多數語法都可以使用k=1
進行解析,所以我試圖設置k=2
僅適用於需要它的規則,但忽略了這一點。我做了這樣的事情:
rule
options { k = 2; }
: // rule definition
;
,但它不能正常工作,直到我集合K在全局選項。我在這裏錯過了什麼?
更新:
它不是從頭開始編寫語法理想的解決方案,因爲我有很多的依賴於它的代碼。此外,缺少的一些小問題的特徵 - 由設計缺失。這不是爲了實現另一個小竅門,小叮噹只是一個靈感。
我會更樂意在這種情況下有一元減工作:-1+2
或2+(-1)
。像2 -- -1
這樣的案例並不重要。
此外,消息鏈接應該儘可能地簡單。這意味着我不喜歡改變我產生的AST的想法。
關於回溯 - 我可以忍受它,只是出於個人好奇而問。
這是生成AST的小修改語法 - 也許這將有助於更好地理解我不想更改的內容。 (temp_variables可能會被刪除,我沒有做出這個決定)。
grammar GAL;
options {
//k=2;
backtrack=true;
language=CSharp3;
output=AST;
}
tokens {
HASH = '#';
COLON = ':';
DOT = '.';
CARET = '^';
PIPE = '|';
LBRACKET = '[';
RBRACKET = ']';
LPAREN = '(';
RPAREN = ')';
ASSIGN = ':=';
}
// generated files options
@namespace { GAL.Compiler }
@lexer::namespace { GAL.Compiler}
// this will disable CLSComplaint warning in ANTLR generated code
@parser::header {
// Do not bug me about [System.CLSCompliant(false)]
#pragma warning disable 3021
}
@lexer::header {
// Do not bug me about [System.CLSCompliant(false)]
#pragma warning disable 3021
}
ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
;
INT : '0'..'9'+
;
FLOAT
: ('0'..'9')+ '.' ('0'..'9')* EXPONENT?
| '.' ('0'..'9')+ EXPONENT?
| ('0'..'9')+ EXPONENT
;
COMMENT
: '"' (options {greedy=false;} : .)* '"' {$channel=Hidden;}
;
WS : (' '
| '\t'
) {$channel=Hidden;}
;
NEW_LINE
: ('\r'?'\n')
;
STRING
: '\'' (ESC_SEQ | ~('\\'|'\''))* '\''
;
fragment
EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;
fragment
HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;
fragment
ESC_SEQ
: '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
| UNICODE_ESC
| OCTAL_ESC
;
fragment
OCTAL_ESC
: '\\' ('0'..'3') ('0'..'7') ('0'..'7')
| '\\' ('0'..'7') ('0'..'7')
| '\\' ('0'..'7')
;
fragment
UNICODE_ESC
: '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
;
BINARY_MESSAGE_CHAR
: ('~' | '!' | '@' | '%' | '&' | '*' | '-' | '+' | '=' | '|' | '\\' | '<' | '>' | ',' | '?' | '/')
('~' | '!' | '@' | '%' | '&' | '*' | '-' | '+' | '=' | '|' | '\\' | '<' | '>' | ',' | '?' | '/')?
;
// parser
public program returns [ AstProgram program ]
: { $program = new AstProgram(); }
NEW_LINE*
(statement (NEW_LINE+ | EOF)
{ $program.AddStatement($statement.stmt); }
)*
;
statement returns [ AstNode stmt ]
: message_sending
{ $stmt = $message_sending.messageSending; }
| return_statement
{ $stmt = $return_statement.ret; }
| assignment
{ $stmt = $assignment.assignment; }
| temp_variables
{ $stmt = $temp_variables.tempVars; }
;
return_statement returns [ AstReturn ret ]
: CARET statement
{ $ret = new AstReturn($CARET, $statement.stmt); }
;
assignment returns [ AstAssignment assignment ]
: dotted_expression ASSIGN statement
{ $assignment = new AstAssignment($dotted_expression.dottedExpression, $ASSIGN, $statement.stmt); }
;
temp_variables returns [ AstTempVariables tempVars ]
: p1=PIPE
{ $tempVars = new AstTempVariables($p1); }
(identifier
{ $tempVars.AddVar($identifier.identifier); }
)+
p2=PIPE
{ $tempVars.EndToken = $p2; }
;
object returns [ AstNode obj ]
: number
{ $obj = $number.number; }
| string
{ $obj = $string.str; }
| dotted_expression
{ $obj = $dotted_expression.dottedExpression; }
| literal
{ $obj = $literal.literal; }
| block
{ $obj = $block.block; }
| LPAREN message_sending RPAREN
{ $obj = $message_sending.messageSending; }
;
message_sending returns [ AstKeywordMessageSending messageSending ]
: keyword_message_sending
{ $messageSending = $keyword_message_sending.keywordMessageSending; }
;
keyword_message_sending returns [ AstKeywordMessageSending keywordMessageSending ]
: binary_message_sending
{ $keywordMessageSending = new AstKeywordMessageSending($binary_message_sending.binaryMessageSending); }
(keyword_message
{ $keywordMessageSending = $keywordMessageSending.NewMessage($keyword_message.keywordMessage); }
)?
;
binary_message_sending returns [ AstBinaryMessageSending binaryMessageSending ]
: unary_message_sending
{ $binaryMessageSending = new AstBinaryMessageSending($unary_message_sending.unaryMessageSending); }
(binary_message
{ $binaryMessageSending = $binaryMessageSending.NewMessage($binary_message.binaryMessage); }
)*
;
unary_message_sending returns [ AstUnaryMessageSending unaryMessageSending ]
: object
{ $unaryMessageSending = new AstUnaryMessageSending($object.obj); }
(
unary_message
{ $unaryMessageSending = $unaryMessageSending.NewMessage($unary_message.unaryMessage); }
)*
;
unary_message returns [ AstUnaryMessage unaryMessage ]
: unary_message_selector
{ $unaryMessage = new AstUnaryMessage($unary_message_selector.unarySelector); }
;
binary_message returns [ AstBinaryMessage binaryMessage ]
: binary_message_selector unary_message_sending
{ $binaryMessage = new AstBinaryMessage($binary_message_selector.binarySelector, $unary_message_sending.unaryMessageSending); }
;
keyword_message returns [ AstKeywordMessage keywordMessage ]
:
{ $keywordMessage = new AstKeywordMessage(); }
(
NEW_LINE?
single_keyword_message_selector
NEW_LINE?
binary_message_sending
{ $keywordMessage.AddMessagePart($single_keyword_message_selector.singleKwSelector, $binary_message_sending.binaryMessageSending); }
)+
;
block returns [ AstBlock block ]
: LBRACKET
{ $block = new AstBlock($LBRACKET); }
(
block_signiture
{ $block.Signiture = $block_signiture.blkSigniture; }
)? NEW_LINE*
block_body
{ $block.Body = $block_body.blkBody; }
NEW_LINE*
RBRACKET
{ $block.SetEndToken($RBRACKET); }
;
block_body returns [ IList<AstNode> blkBody ]
@init { $blkBody = new List<AstNode>(); }
:
(s1=statement
{ $blkBody.Add($s1.stmt); }
)?
(NEW_LINE+ s2=statement
{ $blkBody.Add($s2.stmt); }
)*
;
block_signiture returns [ AstBlockSigniture blkSigniture ]
@init { $blkSigniture = new AstBlockSigniture(); }
:
(COLON identifier
{ $blkSigniture.AddIdentifier($COLON, $identifier.identifier); }
)+ PIPE
{ $blkSigniture.SetEndToken($PIPE); }
;
unary_message_selector returns [ AstUnaryMessageSelector unarySelector ]
: identifier
{ $unarySelector = new AstUnaryMessageSelector($identifier.identifier); }
;
binary_message_selector returns [ AstBinaryMessageSelector binarySelector ]
: BINARY_MESSAGE_CHAR
{ $binarySelector = new AstBinaryMessageSelector($BINARY_MESSAGE_CHAR); }
;
single_keyword_message_selector returns [ AstIdentifier singleKwSelector ]
: identifier COLON
{ $singleKwSelector = $identifier.identifier; }
;
keyword_message_selector returns [ AstKeywordMessageSelector keywordSelector ]
@init { $keywordSelector = new AstKeywordMessageSelector(); }
:
(single_keyword_message_selector
{ $keywordSelector.AddIdentifier($single_keyword_message_selector.singleKwSelector); }
)+
;
symbol returns [ AstSymbol symbol ]
: HASH
(string
{ $symbol = new AstSymbol($HASH, $string.str); }
| identifier
{ $symbol = new AstSymbol($HASH, $identifier.identifier); }
| binary_message_selector
{ $symbol = new AstSymbol($HASH, $binary_message_selector.binarySelector); }
| keyword_message_selector
{ $symbol = new AstSymbol($HASH, $keyword_message_selector.keywordSelector); }
)
;
literal returns [ AstNode literal ]
: symbol
{ $literal = $symbol.symbol; }
(block
{ $literal = new AstMethod($symbol.symbol, $block.block); }
)? // if there is block then this is method
;
number returns [ AstNode number ]
: /*'-'?*/
(INT
{ $number = new AstInt($INT); }
| FLOAT
{ $number = new AstInt($FLOAT); }
)
;
string returns [ AstString str ]
: STRING
{ $str = new AstString($STRING); }
;
dotted_expression returns [ AstDottedExpression dottedExpression ]
: i1=identifier
{ $dottedExpression = new AstDottedExpression($i1.identifier); }
(DOT i2=identifier
{ $dottedExpression.AddIdentifier($i2.identifier); }
)*
;
identifier returns [ AstIdentifier identifier ]
: ID
{ $identifier = new AstIdentifier($ID); }
;
Smalltalk運行時/口譯員有很多不同嗎?我問,因爲GNU Smalltalk版本3.2.4不接受'(1 - -2)printNl'。它*確實*接受'(1 - 2)printNl'(最後有或沒有'.')。 –
關於[你的語法]還有一件事(https://github.com/redline-smalltalk/redline-smalltalk/blob/master/src/main/antlr3/st/redline/compiler/Smalltalk.g):因爲你'我們在'number'分析器規則中輸入了字符'r',輸入如'| r | R:= 42。 r printNl.'將導致''r'' *** not ***被標記爲'ID'標記,但是作爲一個字面的''r''標記。不是你想要的東西,對吧?或者我錯過了什麼? –
感謝您的回答。從根本上改變語法並不是理想的解決方案,因爲我有需要改變的AST,而這只是現在的工作。我正在尋找簡單的方法來添加一元減號(1 - -2只是一個想法,我會很高興有-1 + 2或2 - ( - 3)工作。 –