Antlr4 - 將標識符作爲單個標記匹配

BLOCK_COMMENT : '/*' .*? '*/' -> skip; 
EOL_COMMENT : '//' ~[\r\n]* -> skip; 
WS: [ \n\t\r]+ -> skip; 

program: usingDirectives? EOF; 

usingDirectives: usingDirective+; 

usingDirective: USING 
     fullyQualifiedType 
     (usingAlias | USING_ALL)? 
     END; 

USING: 'using'; 

fullyQualifiedType: identifier (DOT identifier)*; 

identifier: (LETTER | UNDERSCORE) 
     (LETTER | DIGIT | UNDERSCORE)*; 

DOT: '.'; 

usingAlias: AS identifier; 

USING_ALL: '.*'; 

AS: 'as'; 

END: ';'; 

LETTER: [a-zA-Z]; 

DIGIT: [0-9]; 

UNDERSCORE: '_';

這是我的語法。Antlr4 - 將標識符作爲單個標記匹配

using IO.Console.Print as Print; 
using IO.Console; // same as using IO.Console as Console; 
using IO.Console.*;

這是我的測試數據。

語法按預期工作，但標識符中的每個字母變成單個標記，這有點無用。

如果我努力使標識符的詞法規則（標識符），然後我得到了下面的錯誤，在運行測試時：

line 1:23 extraneous input 'as' expecting {'.', '.*', 'as', ';'}

即使我努力標識符只有[A-ZA- Z]，沒有規則，同樣的情況發生。

如果重要，我使用Python3作爲目標語言。請指出任何其他菜鳥的錯誤，因爲這是我使用Antlr的第一個項目。謝謝！

來源

2017-04-02 MackThax

現在您告訴您的詞法分析器生成適合於標識符而不是整個標識符的字符集合。以下簡化的語法（詞法和語法分析器），應爲你工作：

grammar test; 

root 
    : identifier*; 

identifier 
    : IdentifierChars; 

IdentifierChars 
    : [a-zA-Z0-9_]+; 

WhiteSpace 
    : [ \r\n\t]+ -> skip;

這裏是一個Java代碼示例，我用來檢查：

InputStream input = IntegrationMain.class.getResourceAsStream("test.txt"); 
    ANTLRInputStream inputStream = new ANTLRInputStream(input); 
    TokenSource tokenSource = new testLexer(inputStream); 
    CommonTokenStream tokenStream = new CommonTokenStream(tokenSource); 
    testParser parser = new testParser(tokenStream); 
    testParser.RootContext root = parser.root(); 

    root.identifier().forEach(identifier -> System.out.println(identifier.getText()));

，這裏是從標準輸出結果：

abc 
a0bc 
a_bc

來源

2017-04-02 16:16:06 Yevgeniy

謝謝。我接受了你的建議並簡化了語法。另外，事實證明，我對Antlr的工作原理缺乏基本的瞭解。一旦我發現規則順序很重要，並且在任何解析器規則之前都評估了詞法分析器規則，這一切都開始產生了很大的意義 – MackThax

很高興它幫助解決了您的問題，歡迎您。 – Yevgeniy

Antlr4 - 將標識符作爲單個標記匹配

回答

相關問題