2013-03-04 53 views
0

我正在使用JavaCC處理COBOL分析器。 COBOL文件通常將第1到第6列作爲行/列號。如果線/列號不存在,則會有空格。使用Javacc處理COBOL語法中的註釋和行/列號

我需要知道如何處理COBOL文件中的註釋和序列區域並僅解析主區域。

我已經嘗試了很多表達式,但都沒有工作。我創建了一個特殊標記,用於檢查新行,然後檢查六次空格或除空格和回車之外的任何字符,之後第七個字符將爲"*",正常行將爲" "

我使用的可以在這裏找到http://java.net/downloads/javacc/contrib/grammars/cobol.jj

任何人都可以給我建議,我應該用什麼語法Cobol.jj文件?

我的語法文件的樣本:

PARSER_END(CblParser) 

//////////////////////////////////////////////////////////////////////////////// 
// Lexical structure 
//////////////////////////////////////////////////////////////////////////////// 

SPECIAL_TOKEN : 
{ 
    < EOL: "\n" > : LINE_START 
| < SPACECHAR: (" " | "\t" | "\f" | ";" | "\r")+ > 
} 

SPECIAL_TOKEN : 
{ 
    < COMMENT: (~["\n","\r"," "] ~["\n","\r"," "] ~["\n","\r"," "] ~["\n","\r"," "] ~["\n","\r"," "] ~["\n","\r"," "]) ("*" | "|") (~["\n","\r"])* > 
| < PREPROC_COMMENT: "*|" (~["\n","\r"])* > 
| < SPACE_SEPARATOR : (<SPACECHAR> | <EOL>)+ > 
| < COMMA_SEPARATOR : "," <SPACE_SEPARATOR> > 
} 

<LINE_START> SKIP : 
{ 
< ((~[])(~[])(~[])(~[])(~[])(~[])) (" ") > 
} 

回答

1

因爲解析器開始在一行的開始,你應該使用默認的狀態來表示行的開始。我會做下面的事情[未經測試的代碼如下]。

// At the start of each line, the first 6 characters are ignored and the 7th is used 
// to determine whether this is a code line or a comment line. 
// (Continuation lines are handled elsewhere.) 
// If there are fewer than 7 characters on the line, it is ignored. 
// Note that there will be a TokenManagerError if a line has at least 7 characters and 
// the 7th character is other than a "*", a "/", or a space. 
<DEFAULT> SKIP : 
{ 
    < (~[]){0,6} ("\n" | "\r" | "\r\n") > :DEFAULT 
| 
    < (~[]){6} (" ") > :CODE 
| 
    < (~[]){6} ("*"|"/") :COMMENT 
} 

<COMMENT> SKIP : 
{ // At the end of a comment line, return to the DEFAULT state. 
    < "\n" | "\r" | "\r\n" > : DEFAULT 
| // All non-end-of-line characters on a comment line are ignored. 
    < ~["\n","\r"] > : COMMENT 
} 
<CODE> SKIP : 
{ // At the end of a code line, return to the DEFAULT state. 
    < "\n" | "\r" | "\r\n" > : DEFAULT 
| // White space is skipped, as are semicolons. 
    < (" " | "\t" | "\f" | ";")+ > 
} 
<CODE> TOKEN : 
{ 
    < ACCEPT: "accept" > 
| 
    ... // all rules for tokens should be in the CODE state. 
} 
+0

我不相信分號應該被視爲空格。我只是從Thiery Blind的解析器中複製而來。我做的一個小改變就是把\ r本身作爲一個合法的行尾(除非緊跟着一個\ n「)。 – 2013-03-14 12:10:00