2009-08-22 22 views
9

我想學習BNF並試圖組裝一些Z80 ASM代碼。由於我對這兩個領域都很陌生,我的問題是,我是否在正確的軌道上?我正在嘗試將Z80 ASM的格式編寫爲EBNF,以便我可以確定從哪裏開始從源代碼創建機器代碼。目前我有以下幾種:Z80 ASM BNF結構...我在正確的軌道上嗎?

Assignment = Identifier, ":" ; 

Instruction = Opcode, [ Operand ], [ Operand ] ; 

Operand = Identifier | Something* ; 

Something* = "(" , Identifier, ")" ; 

Identifier = Alpha, { Numeric | Alpha } ; 

Opcode = Alpha, Alpha ; 

Int = [ "-" ], Numeric, { Numeric } ; 

Alpha = "A" | "B" | "C" | "D" | "E" | "F" | 
     "G" | "H" | "I" | "J" | "K" | "L" | 
     "M" | "N" | "O" | "P" | "Q" | "R" | 
     "S" | "T" | "U" | "V" | "W" | "X" | 
     "Y" | "Z" ; 

Numeric = "0" | "1" | "2" | "3"| "4" | 
      "5" | "6" | "7" | "8" | "9" ; 

任何方向性反饋,如果我錯了會很好。

+0

http://stackoverflow.com/questions/1305091/writing-an-z80-assembler-lexi的杜佩ng-asm-and-building-a-parse-tree-using-composition由同一用戶 – 2009-08-23 20:10:03

+3

@Butterworth:不重複。另一個問題是關於他可能使用語法構造的樹來傳遞信息。這個問題與他是否應該使用語法有關,如果是的話,它會是什麼樣子。這個問題的答案是其他人感興趣的前提條件。 – 2009-08-23 20:19:01

回答

16

老派裝配是典型的手工編碼的彙編器和用於自組織解析技術來處理組件的源極線,以產生實際的彙編代碼。 當彙編程序語法很簡單時(例如總是OPCODE REG,OPERAND),這種方法運行得很好。

現代機器具有雜亂的,討厭的指令集與許多指令的變化和操作數,其可以與複雜的語法允許多個索引寄存器參加操作數式表示。允許具有固定和可重定位常量的複雜組裝時表達式以及各種類型的加法運算符使這變得複雜。先進的彙編程序允許條件編譯,宏,結構化數據聲明等都對語法提出了新的要求。通過臨時方法處理所有這些語法非常困難,並且是發明分析器發生器的原因。

使用BNF和解析器發電機是建立一個現代化的組裝非常合理的方式,即使是作爲Z80的傳統處理器,。我爲摩托羅拉8位機器(如6800/6809)構建了這樣的組裝器,並且準備爲現代x86做同樣的工作。我認爲你正走向正確的道路。

**********編輯**************** OP要求例如詞法分析器和分析器定義。 我在這裏提供了這兩個。

這些是從一個6809 asssembler真正規範的摘錄。 完整的定義是這裏樣本大小的2-3倍。爲了保持空間不足,我編輯了大部分的暗角複雜度 ,這是這些定義的要點。 人們可能會因爲表現複雜而感到沮喪; 要點是,用這樣的定義,你試圖描述 形狀的語言,而不是程序上的代碼。 如果您以臨時的方式對所有這些代碼進行編碼,那麼您將支付更高的複雜性,並且它將遠遠低於可維護性的 。

這也將有一定的幫助要知道,這些定義 與該 有詞法/分析工具作爲子系統高端項目分析系統中使用,稱爲 The DMS Software Reengineering Toolkit。 DMS將自動在解析器規範中從
語法規則中構建AST,這使得構建解析工具變得更容易。最後, 解析器規範包含所謂的「prettyprinter」聲明,它允許DMS從AST重新創建源文本。 (一本語法的真正目的是爲了讓我們能夠建立代表彙編 指令的AST,然後將它們吐出饋送到一個真正的彙編!)注意

一兩件事:語意和語法規則是如何規定(metasyntxax!) 在不同的詞法分析器/解析器生成器系統之間有所不同。基於DMS的規範的語法不例外。 DMS有自己的相對複雜的 語法規則,在這裏可用的空間中真的不太實際。你必須理解其他系統使用類似符號的想法, EBNF適用於規則,而正則表達式適用於詞位。

鑑於OP的利益,他可以實現類似的詞法分析器/解析器 任何詞法分析器/解析器生成工具,例如,FLEX/YACC, 的JavaCC,ANTLR,...

******* *** LEXER **************

-- M6809.lex: Lexical Description for M6809 
-- Copyright (C) 1989,1999-2002 Ira D. Baxter 

%% 
#mainmode Label 

#macro digit "[0-9]" 
#macro hexadecimaldigit "<digit>|[a-fA-F]" 

#macro comment_body_character "[\u0009 \u0020-\u007E]" -- does not include NEWLINE 

#macro blank "[\u0000 \ \u0009]" 

#macro hblanks "<blank>+" 

#macro newline "\u000d \u000a? \u000c? | \u000a \u000c?" -- form feed allowed only after newline 

#macro bare_semicolon_comment "\; <comment_body_character>* " 

#macro bare_asterisk_comment "\* <comment_body_character>* " 

...[snip] 

#macro hexadecimal_digit "<digit> | [a-fA-F]" 

#macro binary_digit "[01]" 

#macro squoted_character "\' [\u0021-\u007E]" 

#macro string_character "[\u0009 \u0020-\u007E]" 

%%Label -- (First mode) processes left hand side of line: labels, opcodes, etc. 

#skip "(<blank>*<newline>)+" 
#skip "(<blank>*<newline>)*<blank>+" 
    << (GotoOpcodeField ?) >> 

#precomment "<comment_line><newline>" 

#preskip "(<blank>*<newline>)+" 
#preskip "(<blank>*<newline>)*<blank>+" 
    << (GotoOpcodeField ?) >> 

-- Note that an apparant register name is accepted as a label in this mode 
#token LABEL [STRING] "<identifier>" 
    << (local (;; (= [TokenScan natural] 1) ; process all string characters 
     (= [TokenLength natural] ?:TokenCharacterCount)= 
     (= [TokenString (reference TokenBodyT)] (. ?:TokenCharacters)) 
     (= [Result (reference string)] (. ?:Lexeme:Literal:String:Value)) 
     [ThisCharacterCode natural] 
     (define Ordinala #61) 
     (define Ordinalf #66) 
     (define OrdinalA #41) 
     (define OrdinalF #46) 
    );; 
    (;; (= (@ Result) `') ; start with empty string 
    (while (<= TokenScan TokenLength) 
     (;; (= ThisCharacterCode (coerce natural TokenString:TokenScan)) 
     (+= TokenScan) ; bump past character 
     (ifthen (>= ThisCharacterCode Ordinala) 
      (-= ThisCharacterCode #20) ; fold to upper case 
     )ifthen 
     (= (@ Result) (append (@ Result) (coerce character ThisCharacterCode)))= 

     );; 
    )while 
    );; 
)local 
    (= ?:Lexeme:Literal:String:Format (LiteralFormat:MakeCompactStringLiteralFormat 0)) ; nothing interesting in string 
    (GotoLabelList ?) 
    >> 

%%OpcodeField 

#skip "<hblanks>" 
    << (GotoEOLComment ?) >> 
#ifnotoken 
    << (GotoEOLComment ?) >> 

-- Opcode field tokens 
#token 'ABA'  "[aA][bB][aA]" 
    << (GotoEOLComment ?) >> 
#token 'ABX'  "[aA][bB][xX]" 
    << (GotoEOLComment ?) >> 
#token 'ADC'  "[aA][dD][cC]" 
    << (GotoABregister ?) >> 
#token 'ADCA'  "[aA][dD][cC][aA]" 
    << (GotoOperand ?) >> 
#token 'ADCB'  "[aA][dD][cC][bB]" 
    << (GotoOperand ?) >> 
#token 'ADCD'  "[aA][dD][cC][dD]" 
    << (GotoOperand ?) >> 
#token 'ADD'  "[aA][dD][dD]" 
    << (GotoABregister ?) >> 
#token 'ADDA'  "[aA][dD][dD][aA]" 
    << (GotoOperand ?) >> 
#token 'ADDB'  "[aA][dD][dD][bB]" 
    << (GotoOperand ?) >> 
#token 'ADDD'  "[aA][dD][dD][dD]" 
    << (GotoOperand ?) >> 
#token 'AND'  "[aA][nN][dD]" 
    << (GotoABregister ?) >> 
#token 'ANDA'  "[aA][nN][dD][aA]" 
    << (GotoOperand ?) >> 
#token 'ANDB'  "[aA][nN][dD][bB]" 
    << (GotoOperand ?) >> 
#token 'ANDCC'  "[aA][nN][dD][cC][cC]" 
    << (GotoRegister ?) >> 
...[long list of opcodes snipped] 

#token IDENTIFIER [STRING] "<identifier>" 
    << (local (;; (= [TokenScan natural] 1) ; process all string characters 
     (= [TokenLength natural] ?:TokenCharacterCount)= 
     (= [TokenString (reference TokenBodyT)] (. ?:TokenCharacters)) 
     (= [Result (reference string)] (. ?:Lexeme:Literal:String:Value)) 
     [ThisCharacterCode natural] 
     (define Ordinala #61) 
     (define Ordinalf #66) 
     (define OrdinalA #41) 
     (define OrdinalF #46) 
    );; 
    (;; (= (@ Result) `') ; start with empty string 
    (while (<= TokenScan TokenLength) 
     (;; (= ThisCharacterCode (coerce natural TokenString:TokenScan)) 
     (+= TokenScan) ; bump past character 
     (ifthen (>= ThisCharacterCode Ordinala) 
      (-= ThisCharacterCode #20) ; fold to upper case 
     )ifthen 
     (= (@ Result) (append (@ Result) (coerce character ThisCharacterCode)))= 

     );; 
    )while 
    );; 
)local 
    (= ?:Lexeme:Literal:String:Format (LiteralFormat:MakeCompactStringLiteralFormat 0)) ; nothing interesting in string 
    (GotoOperandField ?) 
    >> 

#token '#' "\#" -- special constant introduction (FDB) 
    << (GotoDataField ?) >> 

#token NUMBER [NATURAL] "<decimal_number>" 
    << (local [format LiteralFormat:NaturalLiteralFormat] 
    (;; (= ?:Lexeme:Literal:Natural:Value (ConvertDecimalTokenStringToNatural (. format) ? 0 0)) 
    (= ?:Lexeme:Literal:Natural:Format (LiteralFormat:MakeCompactNaturalLiteralFormat format)) 
    );; 
)local 
(GotoOperandField ?) 
    >> 

#token NUMBER [NATURAL] "\$ <hexadecimal_digit>+" 
    << (local [format LiteralFormat:NaturalLiteralFormat] 
    (;; (= ?:Lexeme:Literal:Natural:Value (ConvertHexadecimalTokenStringToNatural (. format) ? 1 0)) 
    (= ?:Lexeme:Literal:Natural:Format (LiteralFormat:MakeCompactNaturalLiteralFormat format)) 
    );; 
)local 
(GotoOperandField ?) 
    >> 

#token NUMBER [NATURAL] "\% <binary_digit>+" 
    << (local [format LiteralFormat:NaturalLiteralFormat] 
    (;; (= ?:Lexeme:Literal:Natural:Value (ConvertBinaryTokenStringToNatural (. format) ? 1 0)) 
    (= ?:Lexeme:Literal:Natural:Format (LiteralFormat:MakeCompactNaturalLiteralFormat format)) 
    );; 
)local 
(GotoOperandField ?) 
    >> 

#token CHARACTER [CHARACTER] "<squoted_character>" 
    << (= ?:Lexeme:Literal:Character:Value (TokenStringCharacter ? 2)) 
    (= ?:Lexeme:Literal:Character:Format (LiteralFormat:MakeCompactCharacterLiteralFormat 0 0)) ; nothing special about character 
    (GotoOperandField ?) 
    >> 


%%OperandField 

#skip "<hblanks>" 
    << (GotoEOLComment ?) >> 
#ifnotoken 
    << (GotoEOLComment ?) >> 

-- Tokens signalling switch to index register modes 
#token ',' "\," 
    <<(GotoRegisterField ?)>> 
#token '[' "\[" 
    <<(GotoRegisterField ?)>> 

-- Operators for arithmetic syntax 
#token '!!' "\!\!" 
#token '!' "\!" 
#token '##' "\#\#" 
#token '#' "\#" 
#token '&' "\&" 
#token '(' "\(" 
#token ')' "\)" 
#token '*' "\*" 
#token '+' "\+" 
#token '-' "\-" 
#token '/' "\/" 
#token '//' "\/\/" 
#token '<' "\<" 
#token '<' "\<" 
#token '<<' "\<\<" 
#token '<=' "\<\=" 
#token '</' "\<\/" 
#token '=' "\=" 
#token '>' "\>" 
#token '>' "\>" 
#token '>=' "\>\=" 
#token '>>' "\>\>" 
#token '>/' "\>\/" 
#token '\\' "\\" 
#token '|' "\|" 
#token '||' "\|\|" 

#token NUMBER [NATURAL] "<decimal_number>" 
    << (local [format LiteralFormat:NaturalLiteralFormat] 
    (;; (= ?:Lexeme:Literal:Natural:Value (ConvertDecimalTokenStringToNatural (. format) ? 0 0)) 
    (= ?:Lexeme:Literal:Natural:Format (LiteralFormat:MakeCompactNaturalLiteralFormat format)) 
    );; 
)local 
    >> 

#token NUMBER [NATURAL] "\$ <hexadecimal_digit>+" 
    << (local [format LiteralFormat:NaturalLiteralFormat] 
    (;; (= ?:Lexeme:Literal:Natural:Value (ConvertHexadecimalTokenStringToNatural (. format) ? 1 0)) 
    (= ?:Lexeme:Literal:Natural:Format (LiteralFormat:MakeCompactNaturalLiteralFormat format)) 
    );; 
)local 
    >> 

#token NUMBER [NATURAL] "\% <binary_digit>+" 
    << (local [format LiteralFormat:NaturalLiteralFormat] 
    (;; (= ?:Lexeme:Literal:Natural:Value (ConvertBinaryTokenStringToNatural (. format) ? 1 0)) 
    (= ?:Lexeme:Literal:Natural:Format (LiteralFormat:MakeCompactNaturalLiteralFormat format)) 
    );; 
)local 
    >> 

-- Notice that an apparent register is accepted as a label in this mode 
#token IDENTIFIER [STRING] "<identifier>" 
    << (local (;; (= [TokenScan natural] 1) ; process all string characters 
     (= [TokenLength natural] ?:TokenCharacterCount)= 
     (= [TokenString (reference TokenBodyT)] (. ?:TokenCharacters)) 
     (= [Result (reference string)] (. ?:Lexeme:Literal:String:Value)) 
     [ThisCharacterCode natural] 
     (define Ordinala #61) 
     (define Ordinalf #66) 
     (define OrdinalA #41) 
     (define OrdinalF #46) 
    );; 
    (;; (= (@ Result) `') ; start with empty string 
    (while (<= TokenScan TokenLength) 
     (;; (= ThisCharacterCode (coerce natural TokenString:TokenScan)) 
     (+= TokenScan) ; bump past character 
     (ifthen (>= ThisCharacterCode Ordinala) 
      (-= ThisCharacterCode #20) ; fold to upper case 
     )ifthen 
     (= (@ Result) (append (@ Result) (coerce character ThisCharacterCode)))= 

     );; 
    )while 
    );; 
)local 
    (= ?:Lexeme:Literal:String:Format (LiteralFormat:MakeCompactStringLiteralFormat 0)) ; nothing interesting in string 
    >> 

%%Register -- operand field for TFR, ANDCC, ORCC, EXG opcodes 

#skip "<hblanks>" 
#ifnotoken << (GotoRegisterField ?) >> 

%%RegisterField -- handles registers and indexing mode syntax 
-- In this mode, names that look like registers are recognized as registers 

#skip "<hblanks>" 
    << (GotoEOLComment ?) >> 
#ifnotoken 
    << (GotoEOLComment ?) >> 

#token '[' "\[" 
#token ']' "\]" 
#token '--' "\-\-" 
#token '++' "\+\+" 

#token 'A'  "[aA]" 
#token 'B'  "[bB]" 
#token 'CC'  "[cC][cC]" 
#token 'DP'  "[dD][pP] | [dD][pP][rR]" -- DPR shouldnt be needed, but found one instance 
#token 'D'  "[dD]" 
#token 'Z'  "[zZ]" 

-- Index register designations 
#token 'X'  "[xX]" 
#token 'Y'  "[yY]" 
#token 'U'  "[uU]" 
#token 'S'  "[sS]" 
#token 'PCR' "[pP][cC][rR]" 
#token 'PC'  "[pP][cC]" 

#token ',' "\," 

-- Operators for arithmetic syntax 
#token '!!' "\!\!" 
#token '!' "\!" 
#token '##' "\#\#" 
#token '#' "\#" 
#token '&' "\&" 
#token '(' "\(" 
#token ')' "\)" 
#token '*' "\*" 
#token '+' "\+" 
#token '-' "\-" 
#token '/' "\/" 
#token '<' "\<" 
#token '<' "\<" 
#token '<<' "\<\<" 
#token '<=' "\<\=" 
#token '<|' "\<\|" 
#token '=' "\=" 
#token '>' "\>" 
#token '>' "\>" 
#token '>=' "\>\=" 
#token '>>' "\>\>" 
#token '>|' "\>\|" 
#token '\\' "\\" 
#token '|' "\|" 
#token '||' "\|\|" 

#token NUMBER [NATURAL] "<decimal_number>" 
    << (local [format LiteralFormat:NaturalLiteralFormat] 
    (;; (= ?:Lexeme:Literal:Natural:Value (ConvertDecimalTokenStringToNatural (. format) ? 0 0)) 
    (= ?:Lexeme:Literal:Natural:Format (LiteralFormat:MakeCompactNaturalLiteralFormat format)) 
    );; 
)local 
    >> 

... [snip] 

%% -- end M6809.lex 

**************** PARSER ******** ******

-- M6809.ATG: Motorola 6809 assembly code parser 
-- (C) Copyright 1989;1999-2002 Ira D. Baxter; All Rights Reserved 

m6809 = sourcelines ; 

sourcelines = ; 
sourcelines = sourcelines sourceline EOL ; 
    <<PrettyPrinter>>: { V(CV(sourcelines[1]),H(sourceline,A<eol>(EOL))); } 

-- leading opcode field symbol should be treated as keyword. 

sourceline = ; 
sourceline = labels ; 
sourceline = optional_labels 'EQU' expression ; 
    <<PrettyPrinter>>: { H(optional_labels,A<opcode>('EQU'),A<operand>(expression)); } 
sourceline = LABEL 'SET' expression ; 
    <<PrettyPrinter>>: { H(A<firstlabel>(LABEL),A<opcode>('SET'),A<operand>(expression)); } 
sourceline = optional_label instruction ; 
    <<PrettyPrinter>>: { H(optional_label,instruction); } 
sourceline = optional_label optlabelleddirective ; 
    <<PrettyPrinter>>: { H(optional_label,optlabelleddirective); } 
sourceline = optional_label implicitdatadirective ; 
    <<PrettyPrinter>>: { H(optional_label,implicitdatadirective); } 
sourceline = unlabelleddirective ; 
sourceline = '?ERROR' ; 
    <<PrettyPrinter>>: { A<opcode>('?ERROR'); } 

optional_label = labels ; 
optional_label = LABEL ':' ; 
    <<PrettyPrinter>>: { H(A<firstlabel>(LABEL),':'); } 
optional_label = ; 

optional_labels = ; 
optional_labels = labels ; 
labels = LABEL ; 
    <<PrettyPrinter>>: { A<firstlabel>(LABEL); } 
labels = labels ',' LABEL ; 
    <<PrettyPrinter>>: { H(labels[1],',',A<otherlabels>(LABEL)); } 

unlabelleddirective = 'END' ; 
    <<PrettyPrinter>>: { A<opcode>('END'); } 
unlabelleddirective = 'END' expression ; 
    <<PrettyPrinter>>: { H(A<opcode>('END'),A<operand>(expression)); } 
unlabelleddirective = 'IF' expression EOL conditional ; 
    <<PrettyPrinter>>: { V(H(A<opcode>('IF'),H(A<operand>(expression),A<eol>(EOL))),CV(conditional)); } 
unlabelleddirective = 'IFDEF' IDENTIFIER EOL conditional ; 
    <<PrettyPrinter>>: { V(H(A<opcode>('IFDEF'),H(A<operand>(IDENTIFIER),A<eol>(EOL))),CV(conditional)); } 
unlabelleddirective = 'IFUND' IDENTIFIER EOL conditional ; 
    <<PrettyPrinter>>: { V(H(A<opcode>('IFUND'),H(A<operand>(IDENTIFIER),A<eol>(EOL))),CV(conditional)); } 
unlabelleddirective = 'INCLUDE' FILENAME ; 
    <<PrettyPrinter>>: { H(A<opcode>('INCLUDE'),A<operand>(FILENAME)); } 
unlabelleddirective = 'LIST' expression ; 
    <<PrettyPrinter>>: { H(A<opcode>('LIST'),A<operand>(expression)); } 
unlabelleddirective = 'NAME' IDENTIFIER ; 
    <<PrettyPrinter>>: { H(A<opcode>('NAME'),A<operand>(IDENTIFIER)); } 
unlabelleddirective = 'ORG' expression ; 
    <<PrettyPrinter>>: { H(A<opcode>('ORG'),A<operand>(expression)); } 
unlabelleddirective = 'PAGE' ; 
    <<PrettyPrinter>>: { A<opcode>('PAGE'); } 
unlabelleddirective = 'PAGE' HEADING ; 
    <<PrettyPrinter>>: { H(A<opcode>('PAGE'),A<operand>(HEADING)); } 
unlabelleddirective = 'PCA' expression ; 
    <<PrettyPrinter>>: { H(A<opcode>('PCA'),A<operand>(expression)); } 
unlabelleddirective = 'PCC' expression ; 
    <<PrettyPrinter>>: { H(A<opcode>('PCC'),A<operand>(expression)); } 
unlabelleddirective = 'PSR' expression ; 
    <<PrettyPrinter>>: { H(A<opcode>('PSR'),A<operand>(expression)); } 
unlabelleddirective = 'TABS' numberlist ; 
    <<PrettyPrinter>>: { H(A<opcode>('TABS'),A<operand>(numberlist)); } 
unlabelleddirective = 'TITLE' HEADING ; 
    <<PrettyPrinter>>: { H(A<opcode>('TITLE'),A<operand>(HEADING)); } 
unlabelleddirective = 'WITH' settings ; 
    <<PrettyPrinter>>: { H(A<opcode>('WITH'),A<operand>(settings)); } 

settings = setting ; 
settings = settings ',' setting ; 
    <<PrettyPrinter>>: { H*; } 
setting = 'WI' '=' NUMBER ; 
    <<PrettyPrinter>>: { H*; } 
setting = 'DE' '=' NUMBER ; 
    <<PrettyPrinter>>: { H*; } 
setting = 'M6800' ; 
setting = 'M6801' ; 
setting = 'M6809' ; 
setting = 'M6811' ; 

-- collects lines of conditional code into blocks 
conditional = 'ELSEIF' expression EOL conditional ; 
    <<PrettyPrinter>>: { V(H(A<opcode>('ELSEIF'),H(A<operand>(expression),A<eol>(EOL))),CV(conditional[1])); } 
conditional = 'ELSE' EOL else ; 
    <<PrettyPrinter>>: { V(H(A<opcode>('ELSE'),A<eol>(EOL)),CV(else)); } 
conditional = 'FIN' ; 
    <<PrettyPrinter>>: { A<opcode>('FIN'); } 
conditional = sourceline EOL conditional ; 
    <<PrettyPrinter>>: { V(H(sourceline,A<eol>(EOL)),CV(conditional[1])); } 

else = 'FIN' ; 
    <<PrettyPrinter>>: { A<opcode>('FIN'); } 
else = sourceline EOL else ; 
    <<PrettyPrinter>>: { V(H(sourceline,A<eol>(EOL)),CV(else[1])); } 

-- keyword-less directive, generates data tables 

implicitdatadirective = implicitdatadirective ',' implicitdataitem ; 
    <<PrettyPrinter>>: { H*; } 
implicitdatadirective = implicitdataitem ; 

implicitdataitem = '#' expression ; 
    <<PrettyPrinter>>: { A<operand>(H('#',expression)); } 
implicitdataitem = '+' expression ; 
    <<PrettyPrinter>>: { A<operand>(H('+',expression)); } 
implicitdataitem = '-' expression ; 
    <<PrettyPrinter>>: { A<operand>(H('-',expression)); } 
implicitdataitem = expression ; 
    <<PrettyPrinter>>: { A<operand>(expression); } 
implicitdataitem = STRING ; 
    <<PrettyPrinter>>: { A<operand>(STRING); } 

-- instructions valid for m680C (see Software Dynamics ASM manual) 
instruction = 'ABA' ; 
    <<PrettyPrinter>>: { A<opcode>('ABA'); } 
instruction = 'ABX' ; 
    <<PrettyPrinter>>: { A<opcode>('ABX'); } 

instruction = 'ADC' 'A' operandfetch ; 
    <<PrettyPrinter>>: { H(A<opcode>(H('ADC','A')),A<operand>(operandfetch)); } 
instruction = 'ADC' 'B' operandfetch ; 
    <<PrettyPrinter>>: { H(A<opcode>(H('ADC','B')),A<operand>(operandfetch)); } 
instruction = 'ADCA' operandfetch ; 
    <<PrettyPrinter>>: { H(A<opcode>('ADCA'),A<operand>(operandfetch)); } 
instruction = 'ADCB' operandfetch ; 
    <<PrettyPrinter>>: { H(A<opcode>('ADCB'),A<operand>(operandfetch)); } 
instruction = 'ADCD' operandfetch ; 
    <<PrettyPrinter>>: { H(A<opcode>('ADCD'),A<operand>(operandfetch)); } 

instruction = 'ADD' 'A' operandfetch ; 
    <<PrettyPrinter>>: { H(A<opcode>(H('ADD','A')),A<operand>(operandfetch)); } 
instruction = 'ADD' 'B' operandfetch ; 
    <<PrettyPrinter>>: { H(A<opcode>(H('ADD','B')),A<operand>(operandfetch)); } 
instruction = 'ADDA' operandfetch ; 
    <<PrettyPrinter>>: { H(A<opcode>('ADDA'),A<operand>(operandfetch)); } 

[..snip...] 

-- condition code mask for ANDCC and ORCC 
conditionmask = '#' expression ; 
    <<PrettyPrinter>>: { H*; } 
conditionmask = expression ; 

target = expression ; 

operandfetch = '#' expression ; --immediate 
    <<PrettyPrinter>>: { H*; } 

operandfetch = memoryreference ; 

operandstore = memoryreference ; 

memoryreference = '[' indexedreference ']' ; 
    <<PrettyPrinter>>: { H*; } 
memoryreference = indexedreference ; 

indexedreference = offset ; 
indexedreference = offset ',' indexregister ; 
    <<PrettyPrinter>>: { H*; } 
indexedreference = ',' indexregister ; 
    <<PrettyPrinter>>: { H*; } 
indexedreference = ',' '--' indexregister ; 
    <<PrettyPrinter>>: { H*; } 
indexedreference = ',' '-' indexregister ; 
    <<PrettyPrinter>>: { H*; } 
indexedreference = ',' indexregister '++' ; 
    <<PrettyPrinter>>: { H*; } 
indexedreference = ',' indexregister '+' ; 
    <<PrettyPrinter>>: { H*; } 

offset = '>' expression ; -- page zero ref 
    <<PrettyPrinter>>: { H*; } 
offset = '<' expression ; -- long reference 
    <<PrettyPrinter>>: { H*; } 
offset = expression ; 
offset = 'A' ; 
offset = 'B' ; 
offset = 'D' ; 

registerlist = registername ; 
registerlist = registerlist ',' registername ; 
    <<PrettyPrinter>>: { H*; } 

registername = 'A' ; 
registername = 'B' ; 
registername = 'CC' ; 
registername = 'DP' ; 
registername = 'D' ; 
registername = 'Z' ; 
registername = indexregister ; 

indexregister = 'X' ; 
indexregister = 'Y' ; 
indexregister = 'U' ; -- not legal on M6811 
indexregister = 'S' ; 
indexregister = 'PCR' ; 
indexregister = 'PC' ; 

expression = sum '=' sum ; 
    <<PrettyPrinter>>: { H*; } 
expression = sum '<<' sum ; 
    <<PrettyPrinter>>: { H*; } 
expression = sum '</' sum ; 
    <<PrettyPrinter>>: { H*; } 
expression = sum '<=' sum ; 
    <<PrettyPrinter>>: { H*; } 
expression = sum '<' sum ; 
    <<PrettyPrinter>>: { H*; } 
expression = sum '>>' sum ; 
    <<PrettyPrinter>>: { H*; } 
expression = sum '>/' sum ; 
    <<PrettyPrinter>>: { H*; } 
expression = sum '>=' sum ; 
    <<PrettyPrinter>>: { H*; } 
expression = sum '>' sum ; 
    <<PrettyPrinter>>: { H*; } 
expression = sum '#' sum ; 
    <<PrettyPrinter>>: { H*; } 
expression = sum ; 

sum = product ; 
sum = sum '+' product ; 
    <<PrettyPrinter>>: { H*; } 
sum = sum '-' product ; 
    <<PrettyPrinter>>: { H*; } 
sum = sum '!' product ; 
    <<PrettyPrinter>>: { H*; } 
sum = sum '!!' product ; 
    <<PrettyPrinter>>: { H*; } 

product = term '*' product ; 
    <<PrettyPrinter>>: { H*; } 
product = term '||' product ; -- wrong? 
    <<PrettyPrinter>>: { H*; } 
product = term '/' product ; 
    <<PrettyPrinter>>: { H*; } 
product = term '//' product ; 
    <<PrettyPrinter>>: { H*; } 
product = term '&' product ; 
    <<PrettyPrinter>>: { H*; } 
product = term '##' product ; 
    <<PrettyPrinter>>: { H*; } 
product = term ; 

term = '+' term ; 
    <<PrettyPrinter>>: { H*; } 
term = '-' term ; 
    <<PrettyPrinter>>: { H*; } 
term = '\\' term ; -- complement 
    <<PrettyPrinter>>: { H*; } 
term = '&' term ; -- not 

term = IDENTIFIER ; 
term = NUMBER ; 
term = CHARACTER ; 
term = '*' ; 
term = '(' expression ')' ; 
    <<PrettyPrinter>>: { H*; } 

numberlist = NUMBER ; 
numberlist = numberlist ',' NUMBER ; 
    <<PrettyPrinter>>: { H*; } 
+0

艾拉,謝謝你的好評(我試圖排名這個,但我沒有經歷足夠的大聲笑)。我想知道你是否有一個使用這種方法的詞法分析器和/或解析器的例子,這樣我就可以看一看,如果沒有,一些僞代碼的任何機會讓我向前移動:)。最好的問候 – 2009-08-23 09:40:47

+1

@Gary:當心你的要求:}請參閱編輯我的小答案把它變成一個巨大的。 – 2009-08-23 17:44:29

+0

@Gary:PS,爲了「排名靠前」,只需點擊答案開頭旁邊分數上方的三角形即可。 : - } – 2009-08-23 17:46:07

3

BNF更一般地用於例如Pascal,C++,或從Algo1系列(其包括現代語言如C#)衍生真的什麼結構,嵌套的語言。如果我正在實現彙編程序,我可能會使用一些簡單的正則表達式來對操作碼和操作數進行模式匹配。它是因爲我已經用Z80彙編語言一段時間,但你可能會使用類似:

/\s*(\w{2,3})\s+((\w+)(,\w+)?)?/ 

這將匹配它由兩個或三個字母的操作碼,然後分開的一個或兩個操作數的任何線以逗號分隔。在提取像這樣的彙編器行後,您可以查看操作碼併爲指令生成正確的字節,包括操作數的值(如果適用)。

上面使用正則表達式概述的解析器類型將被稱爲「ad hoc」解析器,它基本上意味着您在某種塊的基礎上拆分和檢查輸入(在彙編語言的情況下,通過文本行)。

2

我不認爲你需要過時。當你可以直接將整個事件(模塊大小寫和空白字符串)匹配到一個操作碼時,將解析器分解爲「LD A,A」到加載操作,目標寄存器和源寄存器是毫無意義的。

有沒有那麼多的操作碼,和他們沒有安排在這樣一種方式,你真的從解析和理解IMO彙編得到多少好處。顯然你需要一個解析器來處理字節/地址/索引參數,但除此之外,我只需要進行一對一的查找。

+1

感謝您的反饋......我同意走下更簡單的道路,但也有興趣將此擴展爲更復雜的語言,並且我想從一開始就利用這些功能......還有在ASM的一些其他部分(如equ,.db,.ds,.dw,#include,())方面存在差異,然後我們開始進入簡單的case語句IF ELSE。此外,這也是學習BNF概念的練習,並將其用於更簡單的實現。 – 2009-08-23 01:51:54

相關問題