2016-06-20 45 views
1

我在Python 3.5.0上使用pyparsing(2.1.5)。 我想更快地製作infixNotation。我發現其他人使用ParserElement.enablePackrat()來提高infixNotation的性能。但我無法做到。我的代碼如下。ParserElement.enablePackrat()不會使infixNotation更快

from pyparsing import * 
ParserElement.enablePackrat() 
UNICODE_CHARS = u''.join(
    chr(c) for c in range(65538) if not chr(c).isspace() and 
    chr(c) not in '()"' 
) 
_and_ = Keyword('AND') 
_or_ = Keyword('OR') 
_not_ = Keyword('NOT') 
search_term = ~_and_ + ~_or_ + ~_not_ + Word(UNICODE_CHARS) | QuotedString(
    '"', endQuoteChar='"', unquoteResults=False 
) 
search_expr = infixNotation(
    search_term, 
    [ 
     (_not_, 1, opAssoc.RIGHT), 
     (Optional(_and_), 2, opAssoc.LEFT), 
     (_or_, 2, opAssoc.LEFT), 
    ] 
) 
parsed_query = search_expr.parseString(user_string)[0].asList() 

回答

0

此使用的infixNotation只有3個級別的運營商,所以packratting不會爲你做很多。這些改進通常有5個或更多級別的運算符,例如算術運算。

如果你真的想曲柄表現出來的infixNotation,寫自己的簡裝版:

""" 
BNF 

operand = ~and ~or ~not (A-Za-z0-9)... | quoted_string 

atom = 'not'? (operand | '(' expr ')') 
and_term = atom 'and' atom 
or_term = and_term 'or' and_term 
""" 


_and_ = CaselessKeyword('AND') 
_or_ = CaselessKeyword('OR') 
_not_ = CaselessKeyword('NOT') 
keyword = (_and_ | _or_ | _not_) 
search_term = ~keyword + Word(UNICODE_CHARS) | QuotedString('"', endQuoteChar='"', unquoteResults=False) 

# use this instead of infixNotation - this is essentially what infixNotation will 
# generate, but with fewer FollowedBy's (used to collapse degenerate levels) 
LPAR,RPAR = map(Suppress, "()") 
expr = Forward() 
atom_ = search_term | Group(LPAR + expr + RPAR) 
atom = Group(_not_ + atom_) | atom_ 
and_term = Group(atom + ZeroOrMore(_and_ + atom)) 
or_term = Group(and_term + ZeroOrMore(_or_ + and_term)) 
expr <<= or_term 

# some simple test cases 
tests = """ 
    p and not q 
    p and not q or r 
    p and not (q or r) 
""" 

print("compare with using infixNotation") 
expr.runTests(tests) 

print("compare with using infixNotation") 
search_expr = infixNotation(
    search_term, 
    [ 
     (_not_, 1, opAssoc.RIGHT), 
     (Optional(_and_), 2, opAssoc.LEFT), 
     (_or_, 2, opAssoc.LEFT), 
    ] 
) 

search_expr.runTests(tests) 

硬編碼版本會給輸出,如:

p and not q 
[[['p', 'AND', ['NOT', 'q']]]] 
[0]: 
    [['p', 'AND', ['NOT', 'q']]] 
    [0]: 
    ['p', 'AND', ['NOT', 'q']] 
    [0]: 
     p 
    [1]: 
     AND 
    [2]: 
     ['NOT', 'q'] 


p and not q or r 
[[['p', 'AND', ['NOT', 'q']], 'OR', ['r']]] 
[0]: 
    [['p', 'AND', ['NOT', 'q']], 'OR', ['r']] 
    [0]: 
    ['p', 'AND', ['NOT', 'q']] 
    [0]: 
     p 
    [1]: 
     AND 
    [2]: 
     ['NOT', 'q'] 
    [1]: 
    OR 
    [2]: 
    ['r'] 


p and not (q or r) 
[[['p', 'AND', ['NOT', [[['q'], 'OR', ['r']]]]]]] 
[0]: 
    [['p', 'AND', ['NOT', [[['q'], 'OR', ['r']]]]]] 
    [0]: 
    ['p', 'AND', ['NOT', [[['q'], 'OR', ['r']]]]] 
    [0]: 
     p 
    [1]: 
     AND 
    [2]: 
     ['NOT', [[['q'], 'OR', ['r']]]] 
     [0]: 
     NOT 
     [1]: 
     [[['q'], 'OR', ['r']]] 
     [0]: 
      [['q'], 'OR', ['r']] 
      [0]: 
      ['q'] 
      [1]: 
      OR 
      [2]: 
      ['r'] 

使用中綴表示法將給:

p and not q 
[['p', 'AND', ['NOT', 'q']]] 
[0]: 
    ['p', 'AND', ['NOT', 'q']] 
    [0]: 
    p 
    [1]: 
    AND 
    [2]: 
    ['NOT', 'q'] 


p and not q or r 
[[['p', 'AND', ['NOT', 'q']], 'OR', 'r']] 
[0]: 
    [['p', 'AND', ['NOT', 'q']], 'OR', 'r'] 
    [0]: 
    ['p', 'AND', ['NOT', 'q']] 
    [0]: 
     p 
    [1]: 
     AND 
    [2]: 
     ['NOT', 'q'] 
    [1]: 
    OR 
    [2]: 
    r 


p and not (q or r) 
[['p', 'AND', ['NOT', ['q', 'OR', 'r']]]] 
[0]: 
    ['p', 'AND', ['NOT', ['q', 'OR', 'r']]] 
    [0]: 
    p 
    [1]: 
    AND 
    [2]: 
    ['NOT', ['q', 'OR', 'r']] 
    [0]: 
     NOT 
    [1]: 
     ['q', 'OR', 'r'] 

FollowedBy terms that infixNotation通過在實際分組之前確保有2個或更多術語進行分組,可以添加摺疊退化級別。它們還阻止了操作優先級定義中每個級別的原子的調用解析操作。

如果這些對您無關緊要,請嘗試精簡版。 (另外,請在UNICODE_CHARS的定義上做一點時間 - 這個字符串會產生一些費時,你可能想要預先生成這個字符串到一個單獨的模塊,並且只導入它。)

+0

謝謝您的建議。我可以讓它快3/4。但是我需要讓它快1/10。我會嘗試psyco什麼的.. –

+0

psyco不活着。 pypy在Python3.5上不可用。 –

+0

我將語法定義移出func作用域。性能改進enoght。它是1/20。謝謝! –