2013-05-15 20 views
19

我有一個簡單的語言,我正在編寫一個編譯器(是的,它是家庭作業)編譯一個簡單的語言,我將描述如果必要的Java虛擬機代碼。Python編譯器簡單的語言到Java虛擬機代碼算法

它目前工作得很好我剛剛碰到邏輯AND和OR的凹凸。

在單個if/while條件下,每項工作都很好,但如果我嘗試鏈接它們,那麼出錯了,如果我錯了,請糾正我,但我相信AND有優先權,但我想知道是否有邏輯方法安排他們?我想是我想問的,java虛擬機代碼輸出只是有一個接一個的比較和跳轉語句(這似乎是錯誤的)。我意識到它非常抽象,所以也許我所追求的是一個僞代碼/算法,用於構造鏈式的AND和OR。

編輯:目前只是將AND和OR的任意組合視爲AND。比較因子/術語/表達式連接(比較布爾因子等)我相信AND具有優先權?只是一個想法。

道歉,如果這是知之甚少:/

所以我想生病包括相關的信息只是櫃面。

編譯

import re 
import sys 

# Restrictions: 
# Integer constants must be short. 
# Stack size must not exceed 1024. 
# Integer is the only type. 
# Logical operators cannot be nested. 

class Scanner: 
    '''The interface comprises the methods lookahead and consume. 
     Other methods should not be called from outside of this class.''' 

    def __init__(self, input_file): 
     '''Reads the whole input_file to input_string.''' 
     # source code of the program to be compiled 
     self.input_string = input_file.read() 
     # index where the unprocessed part of input_string starts 
     self.current_char_index = 0 
     # a pair (most recently read token, matched substring of input_string) 
     self.current_token = self.get_token() 

    def skip_white_space(self): 
     '''Consumes all characters in input_string up to the next 
      non-white-space character.''' 
     if (self.current_char_index >= len(self.input_string) - 1): 
      # bad fix for it over-running the end of the file 
      return 

     while self.input_string[self.current_char_index].isspace(): 
      self.current_char_index += 1 


     return 

    def get_token(self): 
     '''Returns the next token and the part of input_string it matched. 
      Returns None if there is no next token. 
      The characters up to the end of the token are consumed.''' 

     self.skip_white_space() 
     # find the longest prefix of input_string that matches a token 
     token, longest = None, '' 
     for (t, r) in Token.token_regexp: 
      match = re.match(r, self.input_string[self.current_char_index:]) 
      if match and match.end() > len(longest): 
       token, longest = t, match.group() 
     # consume the token by moving the index to the end of the matched part 
     self.current_char_index += len(longest) 
     return (token, longest) 

    def lookahead(self): 
     '''Returns the next token without consuming it. 
      Returns None if there is no next token.''' 
     return self.current_token[0] 

    def consume(self, *tokens): 
     '''Returns the next token and consumes it, if it is in tokens. 
      Raises an exception otherwise. 
      If the token is a number or an identifier, its value is returned.''' 
     if self.current_token[0] not in tokens: 
      print('Token ' + self.current_token[0] + ' isn\'t in the tokens: ') 
      for token in tokens: 
       print(token) 
      raise Exception('Token is not in tokens this shouldn\'t happen much') 

     if self.current_token[0] == 'ID': 
      symbol_table.location(self.current_token[1]) 
      value = self.current_token[1] 


     elif (self.current_token[0] == 'NUM'): 
      value = self.current_token[1] 

     else: 
      value = self.current_token[0] 

     self.current_token = self.get_token() 

     return value 



class Token: 
    DO = 'DO'; 
    ELSE = 'ELSE'; 
    END = 'END'; 
    IF = 'IF'; 
    THEN = 'THEN'; 
    WHILE = 'WHILE'; 
    SEM = 'SEM'; 
    BEC = 'BEC'; 
    LESS = 'LESS'; 
    EQ = 'EQ'; 
    GRTR = 'GRTR'; 
    LEQ = 'LEQ'; 
    NEQ = 'NEQ'; 
    GEQ = 'GEQ'; 
    ADD = 'ADD'; 
    SUB = 'SUB'; 
    MUL = 'MUL'; 
    DIV = 'DIV'; 
    LPAR = 'LPAR'; 
    RPAR = 'RPAR'; 
    NUM = 'NUM'; 
    ID = 'ID'; 
    READ = 'READ'; 
    WRITE = 'WRITE'; 
    OR = 'OR'; 
    AND = 'AND'; 
    NOT = 'NOT'; 


    # The following list gives the regular expression to match a token. 
    # The order in the list matters for mimicking Flex behaviour. 
    # Longer matches are preferred over shorter ones. 
    # For same-length matches, the first in the list is preferred. 
    token_regexp = [ 
     (DO, 'do'), 
     (ELSE, 'else'), 
     (END, 'end'), 
     (IF, 'if'), 
     (THEN, 'then'), 
     (WHILE, 'while'), 
     (READ, 'read'), 
     (WRITE, 'write'), 
     (OR, 'or'), 
     (AND, 'and'), 
     (NOT, 'not'),   
     (SEM, ';'), 
     (BEC, ':='), 
     (LESS, '<'), 
     (EQ, '='), 
     (NEQ, '!='), 
     (GRTR, '>'), 
     (LEQ, '<='), 
     (GEQ, '>='), 
     (ADD, '[+]'), # + is special in regular expressions 
     (SUB, '-'), 
     (MUL, '[*]'), 
     (DIV, '/'), 
     (LPAR, '[(]'), # (is special in regular expressions 
     (RPAR, '[)]'), #) is special in regular expressions 
     (ID, '[a-z]+'), 
     (NUM, '[0-9]+'), 
    ] 

class Symbol_Table: 
    '''A symbol table maps identifiers to locations.''' 
    def __init__(self): 
     self.symbol_table = {} 
    def size(self): 
     '''Returns the number of entries in the symbol table.''' 
     return len(self.symbol_table) 
    def location(self, identifier): 
     '''Returns the location of an identifier. If the identifier is not in 
      the symbol table, it is entered with a new location. Locations are 
      numbered sequentially starting with 0.''' 
     if identifier in self.symbol_table: 
      return self.symbol_table[identifier] 
     index = len(self.symbol_table) 
     self.symbol_table[identifier] = index 
     return index 

class Label: 
    def __init__(self): 
     self.current_label = 0 
    def next(self): 
     '''Returns a new, unique label.''' 
     self.current_label += 1 
     return 'l' + str(self.current_label) 

def indent(s, level): 
    return ' '*level + s + '\n' 

# Each of the following classes is a kind of node in the abstract syntax tree. 
# indented(level) returns a string that shows the tree levels by indentation. 
# code() returns a string with JVM bytecode implementing the tree fragment. 
# true_code/false_code(label) jumps to label if the condition is/is not true. 
# Execution of the generated code leaves the value of expressions on the stack. 

class Program_AST: 
    def __init__(self, program): 
     self.program = program 
    def __repr__(self): 
     return repr(self.program) 
    def indented(self, level): 
     return self.program.indented(level) 
    def code(self): 
     program = self.program.code() 
     local = symbol_table.size() 
     java_scanner = symbol_table.location('Java Scanner') 
     return '.class public Program\n' + \ 
       '.super java/lang/Object\n' + \ 
       '.method public <init>()V\n' + \ 
       'aload_0\n' + \ 
       'invokenonvirtual java/lang/Object/<init>()V\n' + \ 
       'return\n' + \ 
       '.end method\n' + \ 
       '.method public static main([Ljava/lang/String;)V\n' + \ 
       '.limit locals ' + str(local) + '\n' + \ 
       '.limit stack 1024\n' + \ 
       'new java/util/Scanner\n' + \ 
       'dup\n' + \ 
       'getstatic java/lang/System.in Ljava/io/InputStream;\n' + \ 
       'invokespecial java/util/Scanner.<init>(Ljava/io/InputStream;)V\n' + \ 
       'astore ' + str(java_scanner) + '\n' + \ 
       program + \ 
       'return\n' + \ 
       '.end method\n' 

class Statements_AST: 
    def __init__(self, statements): 
     self.statements = statements 
    def __repr__(self): 
     result = repr(self.statements[0]) 
     for st in self.statements[1:]: 
      result += '; ' + repr(st) 
     return result 
    def indented(self, level): 
     result = indent('Statement(s)', level) 
     for st in self.statements: 
      result += st.indented(level+1) 
     return result 
    def code(self): 
     result = '' 
     for st in self.statements: 
      result += st.code() 
     return result 

class If_AST: 
    def __init__(self, boolean_expression, then): 
     self.boolean_expression = boolean_expression 
     self.then = then 
    def __repr__(self): 
     return 'if ' + repr(self.boolean_expression) + ' then ' + \ 
         repr(self.then) + ' end' 
    def indented(self, level): 
     return indent('If-Then', level) + \ 
       self.boolean_expression.indented(level+1) + \ 
       self.then.indented(level+1) 
    def code(self): 
     l1 = label_generator.next() 
     return self.boolean_expression.code(l1) + \ 
       self.then.code() + \ 
       l1 + ':\n' 

class If_Else_AST: 
    def __init__(self, boolean_expression, then, _else): 
     self.boolean_expression = boolean_expression; 
     self.then = then; 
     self._else = _else; 
    def __repr__(self): 
     return 'if ' + repr(self.boolean_expression) + ' then ' + \ 
         repr(self.then) + ' else ' + \ 
         repr(self._else) + ' end' 
    def indented(self, level): 
     return indent('If-Then-Else', level) + \ 
       self.boolean_expression.indented(level+1) + \ 
       self.then.indented(level+1) + \ 
       indent('Else', level+1) + \ 
       self._else.indented(level+1) 
    def code(self): 
     l1 = label_generator.next() 
     l2 = label_generator.next() 
     return self.boolean_expression.code(l1) + \ 
       self.then.code() + \ 
       'goto ' + l2 + '\n' + \ 
       l1 + ':\n' + \ 
       self._else.code() + \ 
       l2 + ':\n' 


class While_AST: 
    def __init__(self, boolean_term, body): 
     self.boolean_term = boolean_term 
     self.body = body 
    def __repr__(self): 
     return 'while ' + repr(self.boolean_term) + ' do ' + \ 
          repr(self.body) + ' end' 
    def indented(self, level): 
     return indent('While-Do', level) + \ 
       self.boolean_term.indented(level+1) + \ 
       self.body.indented(level+2) 
    def code(self): 
     l1 = label_generator.next() 
     l2 = label_generator.next() 
     return l1 + ':\n' + \ 
       self.boolean_term.code(l2) + \ 
       self.body.code() + \ 
       'goto ' + l1 + '\n' + \ 
       l2 + ':\n' 

class Assign_AST: 
    def __init__(self, identifier, expression): 
     self.identifier = identifier 
     self.expression = expression 
    def __repr__(self): 
     return repr(self.identifier) + ':=' + repr(self.expression) 
    def indented(self, level): 
     return indent('Assign', level) + \ 
       self.identifier.indented(level+1) + \ 
       self.expression.indented(level+1) 
    def code(self): 
     loc = symbol_table.location(self.identifier.identifier) 
     return self.expression.code() + \ 
       'istore ' + str(loc) + '\n' 

class Write_AST: 
    def __init__(self, expression): 
     self.expression = expression 
    def __repr__(self): 
     return 'write ' + repr(self.expression) 
    def indented(self, level): 
     return indent('Write', level) + self.expression.indented(level+1) 
    def code(self): 
     return 'getstatic java/lang/System/out Ljava/io/PrintStream;\n' + \ 
       self.expression.code() + \ 
       'invokestatic java/lang/String/valueOf(I)Ljava/lang/String;\n' + \ 
       'invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V\n' 

class Read_AST: 
    def __init__(self, identifier): 
     self.identifier = identifier 
    def __repr__(self): 
     return 'read ' + repr(self.identifier) 
    def indented(self, level): 
     return indent('Read', level) + self.identifier.indented(level+1) 
    def code(self): 
     java_scanner = symbol_table.location('Java Scanner') 
     loc = symbol_table.location(self.identifier.identifier) 
     return 'aload ' + str(java_scanner) + '\n' + \ 
       'invokevirtual java/util/Scanner.nextInt()I\n' + \ 
       'istore ' + str(loc) + '\n' 

class Comparison_AST: 
    def __init__(self, left, op, right): 
     self.left = left 
     self.op = op 
     self.right = right 
    def __repr__(self): 
     op = { Token.LESS:'<', Token.EQ:'=', Token.GRTR:'>', 
       Token.LEQ:'<=', Token.NEQ:'!=', Token.GEQ:'>=' } 
     return repr(self.left) + op[self.op] + repr(self.right) 
    def indented(self, level): 
     return indent(self.op, level) + \ 
       self.left.indented(level+1) + \ 
       self.right.indented(level+1) 
    def true_code(self, label): 
     op = { Token.LESS:'if_icmplt', Token.EQ:'if_icmpeq', 
       Token.GRTR:'if_icmpgt', Token.LEQ:'if_icmple', 
       Token.NEQ:'if_icmpne', Token.GEQ:'if_icmpge' } 
     return self.left.code() + \ 
       self.right.code() + \ 
       op[self.op] + ' ' + label + '\n' 
    def false_code(self, label): 
     # Negate each comparison because of jump to "false" label. 
     op = { Token.LESS:'if_icmpge', Token.EQ:'if_icmpne', 
       Token.GRTR:'if_icmple', Token.LEQ:'if_icmpgt', 
       Token.NEQ:'if_icmpeq', Token.GEQ:'if_icmplt' } 
     return self.left.code() + \ 
       self.right.code() + \ 
       op[self.op] + ' ' + label + '\n' 

class Expression_AST: 
    def __init__(self, left, op, right): 
     self.left = left 
     self.op = op 
     self.right = right 
    def __repr__(self): 
     op = { Token.ADD:'+', Token.SUB:'-', Token.MUL:'*', Token.DIV:'/' } 
     return '(' + repr(self.left) + op[self.op] + repr(self.right) + ')' 
    def indented(self, level): 
     return indent(self.op, level) + \ 
       self.left.indented(level+1) + \ 
       self.right.indented(level+1) 
    def code(self): 
     op = { Token.ADD:'iadd', Token.SUB:'isub', 
       Token.MUL:'imul', Token.DIV:'idiv' } 
     return self.left.code() + \ 
       self.right.code() + \ 
       op[self.op] + '\n' 

class Number_AST: 
    def __init__(self, number): 
     self.number = number 
    def __repr__(self): 
     return self.number 
    def indented(self, level): 
     return indent(self.number, level) 
    def code(self): # works only for short numbers 
     return 'sipush ' + self.number + '\n' 

class Identifier_AST: 
    def __init__(self, identifier): 
     self.identifier = identifier 
    def __repr__(self): 
     return self.identifier 
    def indented(self, level): 
     return indent(self.identifier, level) 
    def code(self): 
     loc = symbol_table.location(self.identifier) 
     return 'iload ' + str(loc) + '\n' 

class BooleanFactor_AST: 
    def __init__(self, condition, logic): 
     self.condition = condition 
     self.logic = logic 
    def __repr__(self): 
     if self.logic == False: 
      return 'NOT ' + repr(self.condition) 
     else: 
      return repr(self.condition) 
    def indented(self, level): 
     if self.logic == False: 
      return indent('NOT ', level) + self.condition.indented(level + 1) 
     else: 
      return self.condition.indented(level) 
    def false_code(self, label): 
     if self.logic == True: 
      return self.condition.false_code(label) 
     else: 
      return self.condition.true_code(label) 
     return 
    def true_code(self, label): 
     if self.logic == True: 
      return self.condition.true_code(label) 
     else: 
      return self.condition.false_code(label) 


class BooleanTerm_AST: 
    def __init__(self, terms): 
     self.terms = terms 
    def __repr__(self): 
     result = repr(self.terms[0]) 
     for term in self.terms[1:]: 
      result = result + ' AND ' + repr(term) 
     return result 
    def indented(self, level): 
     result = self.terms[0].indented(level) 
     for term in self.terms[1:]: 
      result = result + indent('AND', level) 
      result = result + term.indented(level) 
     return result 
    def code(self, label): 
     result = '' 
     for term in self.terms: 
      result = result + term.false_code(label) 
     return result 

class BooleanExpression_AST: 
    def __init__(self, expressions): 
     self.expressions = expressions 
    def __repr__(self): 
     result = repr(self.expressions[0]) 
     for expression in self.expressions[1:]: 
      result = result + ' OR ' + repr(expression) 
     return result 
    def indented(self, level): 
     result = self.expressions[0].indented(level) 
     indentation = 0 
     for expression in self.expressions[1:]: 
      indentation += 1 
      result = result + indent('OR', level + indentation) 
      result = result + expression.indented(level + indentation) 
     return result 
    def code(self, label): 
     result = '' 
     for expression in self.expressions: 
      result = result + expression.code(label) 
     return result 


# The following methods comprise the recursive-descent parser. 

def program(): 
    sts = statements() 
    return Program_AST(sts) 

def statements(): 
    result = [statement()] 
    while scanner.lookahead() == Token.SEM: 
     scanner.consume(Token.SEM) 
     st = statement() 
     result.append(st) 
    return Statements_AST(result) 

def statement(): 
    if scanner.lookahead() == Token.IF: 
     return if_statement() 
    elif scanner.lookahead() == Token.WHILE: 
     return while_statement() 
    elif scanner.lookahead() == Token.ID: 
     return assignment() 
    elif scanner.lookahead() == Token.READ: 
     return read(); 
    elif scanner.lookahead() == Token.WRITE: 
     return write(); 
    else: # error 
     return scanner.consume(Token.IF, Token.WHILE, Token.ID) 

def if_statement(): 
    scanner.consume(Token.IF) 
    condition = boolean_expression() 
    scanner.consume(Token.THEN) 
    then = statements() 
    if scanner.lookahead() == Token.END: 
     scanner.consume(Token.END) 
     return If_AST(condition, then) 
    else: 
     scanner.consume(Token.ELSE) 
     _else = statements() 
     scanner.consume(Token.END) 
     return If_Else_AST(condition, then, _else) 

def while_statement(): 
    scanner.consume(Token.WHILE) 
    condition = boolean_expression() 
    scanner.consume(Token.DO) 
    body = statements() 
    scanner.consume(Token.END) 
    return While_AST(condition, body) 

def assignment(): 
    ident = identifier() 
    scanner.consume(Token.BEC) 
    expr = expression() 
    return Assign_AST(ident, expr) 

def read(): 
    scanner.consume(Token.READ) 
    variable = identifier() 
    return Read_AST(variable) 


def write(): 
    scanner.consume(Token.WRITE) 
    expr = expression() 
    return Write_AST(expr) 

def comparison(): 
    left = expression() 
    op = scanner.consume(Token.LESS, Token.EQ, Token.GRTR, 
         Token.LEQ, Token.NEQ, Token.GEQ) 
    right = expression() 
    return Comparison_AST(left, op, right) 

def expression(): 
    result = term() 
    while scanner.lookahead() in [Token.ADD, Token.SUB]: 
     op = scanner.consume(Token.ADD, Token.SUB) 
     tree = term() 
     result = Expression_AST(result, op, tree) 
    return result 

def term(): 
    result = factor() 
    while scanner.lookahead() in [Token.MUL, Token.DIV]: 
     op = scanner.consume(Token.MUL, Token.DIV) 
     tree = factor() 
     result = Expression_AST(result, op, tree) 
    return result 

def factor(): 
    if scanner.lookahead() == Token.LPAR: 
     scanner.consume(Token.LPAR) 
     result = expression() 
     scanner.consume(Token.RPAR) 
     return result 
    elif scanner.lookahead() == Token.NUM: 
     value = scanner.consume(Token.NUM) 
     return Number_AST(value) 
    elif scanner.lookahead() == Token.ID: 
     return identifier() 
    else: # error 
     return scanner.consume(Token.LPAR, Token.NUM, Token.ID) 

def identifier(): 
    value = scanner.consume(Token.ID) 
    return Identifier_AST(value) 

def boolean_factor(): 
    if scanner.lookahead() == Token.NOT: 
     scanner.consume(Token.NOT) 
     logic = False 
    else: 
     logic = True 
    result = comparison() 
    return BooleanFactor_AST(result, logic) 

def boolean_term(): 
    result = [boolean_factor()] 
    while scanner.lookahead() in [Token.AND]: 
     scanner.consume(scanner.lookahead()) 
     temp = boolean_factor() 
     result.append(temp) 

    return BooleanTerm_AST(result) 

def boolean_expression(): 
    result = [boolean_term()] 
    while scanner.lookahead() in [Token.OR]: 
     scanner.consume(scanner.lookahead()) 
     temp = boolean_term() 
     result.append(temp) 

    return BooleanExpression_AST(result) 

# Initialise scanner, symbol table and label generator. 

#scanner = Scanner(open('test.txt')) 
scanner = Scanner(sys.stdin) 
symbol_table = Symbol_Table() 
symbol_table.location('Java Scanner') # fix a location for the Java Scanner 
label_generator = Label() 

# Uncomment the following to test the scanner without the parser. 
# This shows a list of all tokens in the input. 
# 
#token = scanner.lookahead() 

#while token != None: 
# print(token) 
# scanner.consume(token) 
# token = scanner.lookahead() 

#exit() 

# Call the parser. 

ast = program() 
assert scanner.lookahead() == None 

# Uncomment the following to test the parser without the code generator. 
# The first line gives back the program by calling __repr__ of the AST classes. 
# The second line shows the syntax tree with levels indicated by indentation. 
# 
#print(ast) 
#print(ast.indented(0)) 
#exit() 

# Call the code generator. 

# This translates the abstract syntax tree to JVM bytecode. 
# It can be assembled to a class file by Jasmin: http://jasmin.sourceforge.net/ 

print(ast.code()) 

測試bat文件

python compiler.py <test.txt> Program.j 
java -Xmx100m -jar jasmin.jar Program.j 
java -Xmx100m Program <testInput.txt> test_output.txt 

和語言(BNF)

Program = Statements 
Statements = Statement (; Statement) 
Statement = If | While | Assignment 
If = if Comparison then Statements end 
While = while Comparison do Statements end 
Assignment = identifier := Expression 
Comparison = Expression Relation Expression 
Relation = = | != | < | <= | > | >= 
Expression = Term ((+ | -) Term) 
Term = Factor ((* | /) Factor) 
Factor = (Expression) | number | identifier 
BooleanExpression = BooleanTerm (or BooleanTerm)* 
BooleanTerm = BooleanFactor (and BooleanFactor)* 
BooleanFactor = not BooleanFactor | Comparison 

我認爲是所有這些都是相關的,歡呼聲,如果你需要幫助一展身手我在此

+0

eep,語言描述中的j代表按位或(|) – Zeno15

+1

如果您想修復該問題,您可以編輯您的問題(點擊帖子底部灰色的'編輯'按鈕)。 –

+0

爲此歡呼 – Zeno15

回答

1

,如果你想鏈或年代和AND'syou的方法可以使用這個屬性:

p v q === ¬p^¬q 

是等價的,你可以處理所有的並形成。例如。

p v q^r v s === ¬p^¬q^¬r^¬s 

所以計算表達式中並形成是與一種算法簡單。

我想表達式沒有任何圓括號,換句話說,您需要優先考慮分組符號(),[],{}。