2012-10-17 64 views
6

我正在從使用python的電子表格xml構建公式參考圖。公式就像解析excel樣式公式

=IF(AND(LEN(R[-2]C[-1])>0,R[-1]C),WriteCurve(OFFSET(R16C6, 0,0,R9C7,R10C7),R15C6,R10C3, R8C3),"NONE") 

我只想獲得writecurve函數的第n個參數。在這裏我出現了非常C風格的程序,基本上算不上內部括號。有很多嵌套公式

def parseArguments(t, func, n): 
start=t.find(func)+len(func)+1 
bracket = 0 
ss = t[start:] 
lastcomma = 0 
for i, a in enumerate(ss): 
    if a=="(": 
     bracket +=1 
    elif a==")": 
     if bracket==0: 
      break 
     bracket-=1 
    elif a == ",": 
     if bracket==0 and n==0: 
      break 
     elif bracket ==0: 
      if n-1==0: 
       lastcomma = i 
      n-=1 
if lastcomma == 0: 
    return ss[:i] 
else: 
    return ss[lastcomma+1:i] 

是否有pythonic方式做到這一點?還是有更好的遞歸方式來解析整個公式?非常感謝

回答

8

我知道的最好的Excel公式分析器是E. W. Bachtal's algorithm。 Robin Macharg有一個Python端口;我知道的最新版本是pycel project的一部分,但它可以單獨使用 - tokenizer。解析你的公式沒有問題:

from tokenizer import shunting_yard 
rpn = shunting_yard('=IF(AND(LEN(R[-2]C[-1])>0,R[-1]C),WriteCurve(OFFSET(R16C6, 0,0,R9C7,R10C7),R15C6,R10C3, R8C3),"NONE")') 
print(rpn) 
deque([<tokenizer.RangeNode object at 0x2b7b1f5d7850>, <tokenizer.FunctionNode object at 0x2b7b1f5d7950>, <tokenizer.ASTNode object at 0x2b7b1f5d7990>, <tokenizer.ASTNode object at 0x2b7b1f5d79d0>, <tokenizer.RangeNode object at 0x2b7b1f5d7a10>, <tokenizer.FunctionNode object at 0x2b7b1f5d7a50>, <tokenizer.RangeNode object at 0x2b7b1f5d7a90>, <tokenizer.ASTNode object at 0x2b7b1f5d7ad0>, <tokenizer.ASTNode object at 0x2b7b1f5d7b10>, <tokenizer.RangeNode object at 0x2b7b1f5d7b50>, <tokenizer.RangeNode object at 0x2b7b1f5d7b90>, <tokenizer.FunctionNode object at 0x2b7b1f5d7bd0>, <tokenizer.RangeNode object at 0x2b7b1f5d7c10>, <tokenizer.RangeNode object at 0x2b7b22efc450>, <tokenizer.RangeNode object at 0x2b7b22efc510>, <tokenizer.FunctionNode object at 0x2b7b22efc410>, <tokenizer.ASTNode object at 0x2b7b22eff110>, <tokenizer.FunctionNode object at 0x2b7b22eff150>]) 

令牌生成器給你留下一個RPN棧;如果你會發現與AST更方便,你可以很容易地轉換爲AST工作:

def rpn_to_ast(rpn): 
    stack = [] 
    for n in rpn: 
     num_args = (2 if n.token.ttype == "operator-infix" else 
        1 if n.token.ttype.startswith('operator') else 
        n.num_args if n.token.ttype == 'function' else 0) 
     n.args = [stack.pop() for _ in range(num_args)][::-1] 
     stack.append(n) 
    return stack[0] 

然後你可以走AST找到WriteCurve節點,並檢查其參數:

def walk(ast): 
    yield ast 
    for arg in getattr(ast, 'args', []): 
     for node in walk(arg): 
      yield node 

write_curve = next(node for node in walk(rpn_to_ast(rpn)) if node.token.ttype == 'function' and node.token.tvalue == 'WriteCurve') 
print(write_curve.args[2].token.tvalue) 
R10C3